classification
Title: pickle: Faster serialization of Unicode strings
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, haypo, jcea, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-08-08 22:38 by haypo, last changed 2013-04-07 21:34 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_unicode.patch haypo, 2012-08-08 22:38 review
pickleutf8.patch pitrou, 2013-04-06 16:48 review
Messages (15)
msg167730 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-08 22:38
Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.

Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393):

 * text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer)
 * binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8

The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize).
msg167731 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-08 22:41
Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our "bigmem" buildbot.

http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%203.x/builds/165/steps/test/logs/stdio

======================================================================
ERROR: test_huge_str_32b (test.test_pickle.InMemoryPickleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py", line 1281, in wrapper
    return f(self, maxsize)
  File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py", line 1267, in test_huge_str_32b
    pickled = self.dumps(data, protocol=proto)
  File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py", line 49, in dumps
    return pickle.dumps(arg, protocol)
MemoryError
msg167796 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-09 17:10
Looks interesting. Can you post benchmark numbers?
(you can use the pickle tests from http://hg.python.org/benchmarks )
msg167839 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-09 21:49
Here is a benchmark comparing Python 3.3 without and with my patch

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/python
Running fastpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.530622 -> 0.332841: 1.59x faster
Avg: 0.539450 -> 0.336833: 1.60x faster
Significant (t=232.04)
Stddev: 0.00552 -> 0.00276: 2.0032x smaller
Timeline: b'http://tinyurl.com/dyu3vap'

The following not significant results are hidden, use -v to show them:
pickle_dict, pickle_list, slowpickle.
msg167842 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-09 22:03
For your information, results of benchmark comparing Python 3.2 to 3.3:

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python 
Running fastpickle...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.455842 -> 0.542103: 1.19x slower
Avg: 0.462334 -> 0.547271: 1.18x slower
Significant (t=-101.15)
Stddev: 0.00362 -> 0.00471: 1.3028x larger
Timeline: b'http://tinyurl.com/btr644x'

### pickle_dict ###
Min: 0.360125 -> 0.345850: 1.04x faster
Avg: 0.364019 -> 0.348431: 1.04x faster
Significant (t=30.84)
Stddev: 0.00308 -> 0.00181: 1.6973x smaller
Timeline: b'http://tinyurl.com/cd3ashu'

### pickle_list ###
Min: 0.803941 -> 0.584800: 1.37x faster
Avg: 0.811115 -> 0.589200: 1.38x faster
Significant (t=455.00)
Stddev: 0.00261 -> 0.00225: 1.1612x smaller
Timeline: b'http://tinyurl.com/8u4m2wf'

### slowpickle ###
Min: 0.409008 -> 0.461257: 1.13x slower
Avg: 0.413668 -> 0.466201: 1.13x slower
Significant (t=-115.31)
Stddev: 0.00236 -> 0.00219: 1.0772x smaller
Timeline: b'http://tinyurl.com/czrg5kf'
msg167847 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-08-09 22:08
Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well.
msg167848 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-09 22:08
Last one: Python 3.2 vs patched Python 3.3.

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/python
Running fastpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.470211 -> 0.322453: 1.46x faster
Avg: 0.475718 -> 0.328496: 1.45x faster
Significant (t=205.65)
Stddev: 0.00317 -> 0.00395: 1.2456x larger
Timeline: b'http://tinyurl.com/9qpphzp'

### pickle_dict ###
Min: 0.353965 -> 0.347959: 1.02x faster
Avg: 0.358980 -> 0.350596: 1.02x faster
Significant (t=10.44)
Stddev: 0.00545 -> 0.00160: 3.3956x smaller
Timeline: b'http://tinyurl.com/9pfeqf9'

### pickle_list ###
Min: 0.838222 -> 0.593497: 1.41x faster
Avg: 0.844636 -> 0.599491: 1.41x faster
Significant (t=296.53)
Stddev: 0.00520 -> 0.00267: 1.9521x smaller
Timeline: b'http://tinyurl.com/9rynvnv'

### slowpickle ###
Min: 0.408205 -> 0.458309: 1.12x slower
Avg: 0.413738 -> 0.463916: 1.12x slower
Significant (t=-53.85)
Stddev: 0.00263 -> 0.00604: 2.3019x larger
Timeline: b'http://tinyurl.com/coffkbg'
msg178872 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-01-03 01:14
serhiy: I'm not really motivated to finish the work on this issue (especially "... it would probably be good idea to benchmarks non-ASCII strings as well."). Would you like to work on this?
msg178934 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-03 11:04
Well, I take care of this. I have the own patch for raw_unicode_escape() optimization, but microbenchmarks don't show any speed up. Maybe your approach will be better.
msg186115 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-05 23:27
Ping?
msg186126 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-06 14:39
Since protocol 0 is essentially dead in Python 3, I would like to propose something simpler and safer: only optimize the binary protocols. If noone beats me to it, I'll adapt Victor's patch for that.
msg186139 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-06 16:48
Here is a new patch. Benchmark:

### fastpickle ###
Min: 0.631457 -> 0.399104: 1.58x faster
Avg: 0.631868 -> 0.399519: 1.58x faster
Significant (t=701.85)
Stddev: 0.00037 -> 0.00064: 1.7604x larger
Timeline: http://tinyurl.com/c6n8h5g
msg186218 - (view) Author: Roundup Robot (python-dev) Date: 2013-04-07 15:41
New changeset 09a84091ae96 by Antoine Pitrou in branch 'default':
Issue #15596: Faster pickling of unicode strings.
http://hg.python.org/cpython/rev/09a84091ae96
msg186219 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-07 15:50
I've applied the review comments and committed the patch. Thank you!
msg186247 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-04-07 21:34
Hi Antoine, I prefer your patch. Great job!

2013/4/7 Antoine Pitrou <report@bugs.python.org>:
>
> Antoine Pitrou added the comment:
>
> I've applied the review comments and committed the patch. Thank you!
>
> ----------
> resolution:  -> fixed
> stage: patch review -> committed/rejected
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15596>
> _______________________________________
History
Date User Action Args
2013-04-07 21:34:52hayposetmessages: + msg186247
2013-04-07 15:50:10pitrousetstatus: open -> closed
resolution: fixed
messages: + msg186219

stage: patch review -> resolved
2013-04-07 15:41:01python-devsetnosy: + python-dev
messages: + msg186218
2013-04-06 16:49:43pitrousetstage: patch review
2013-04-06 16:48:13pitrousetfiles: + pickleutf8.patch

messages: + msg186139
2013-04-06 14:39:27pitrousetmessages: + msg186126
2013-04-05 23:27:45pitrousetmessages: + msg186115
2013-01-03 11:04:52serhiy.storchakasetmessages: + msg178934
2013-01-03 01:14:16hayposetnosy: + serhiy.storchaka
messages: + msg178872
2012-08-11 02:16:47jceasetnosy: + jcea
2012-08-09 22:08:26hayposetmessages: + msg167848
2012-08-09 22:08:24alexandre.vassalottisetmessages: + msg167847
2012-08-09 22:03:37hayposetmessages: + msg167842
2012-08-09 21:49:42hayposetmessages: + msg167839
2012-08-09 17:10:27pitrousetmessages: + msg167796
2012-08-08 22:41:32hayposetmessages: + msg167731
2012-08-08 22:38:41haypocreate