Message167730
Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.
Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393):
* text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer)
* binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8
The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize). |
|
Date |
User |
Action |
Args |
2012-08-08 22:38:42 | vstinner | set | recipients:
+ vstinner, pitrou, alexandre.vassalotti |
2012-08-08 22:38:42 | vstinner | set | messageid: <1344465522.38.0.375715302831.issue15596@psf.upfronthosting.co.za> |
2012-08-08 22:38:41 | vstinner | link | issue15596 messages |
2012-08-08 22:38:41 | vstinner | create | |
|