classification
Title: Speed up cPickle's pickling generally
Type: performance Stage: resolved
Components: Extension Modules Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Add Unladen Swallow's optimizations to Python 3's pickle.
View: 9410
Assigned To: Nosy List: alexandre.vassalotti, collinwinter, jcsalterego, pitrou
Priority: normal Keywords: needs review, patch

Created on 2009-04-04 00:02 by collinwinter, last changed 2010-08-05 08:17 by alexandre.vassalotti. This issue is now closed.

Files
File name Uploaded Description Edit
cpickle_writes.patch collinwinter, 2009-04-04 00:02 Patch against trunk, r71111
cpickle_writes-r73872.patch jcsalterego, 2009-07-07 06:13 Patch against python27 trunk, r73872
cPickle.-r77393.patch skip.montanaro, 2010-01-10 00:46
Messages (18)
msg85349 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-04-04 00:02
This patch simplifies cPickle's complicated internal buffering system.
The new version uses a single buffer closer to the Pickler object,
flushing to a file object only when necessary. This avoids the overhead
of several indirection layers for what are frequently very small writes.

Benchmarked against virgin trunk r71058 using "perf.py -r -b
pickle,pickle_list,pickle_dict":

pickle:
Min: 2.225 -> 1.962: 13.37% faster
Avg: 2.254 -> 1.994: 13.03% faster
Significant (t=70.763434, a=0.95)

pickle_dict:
Min: 2.097 -> 1.418: 47.83% faster
Avg: 2.136 -> 1.446: 47.75% faster
Significant (t=214.900146, a=0.95)

pickle_list:
Min: 1.128 -> 0.777: 45.22% faster
Avg: 1.152 -> 0.807: 42.65% faster
Significant (t=169.789433, a=0.95)

A similar patch for unpickling will follow. As issue 5670 and 5671 are
committed, I'll update this patch to support them.

This patch passes all the tests added in issue 5665. I would recommend
reviewing that patch first. I'll port to py3k once this is reviewed for
trunk. This patch is available on Rietveld at
http://codereview.appspot.com/33070.
msg90185 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-06 18:01
Any updates?
msg90212 - (view) Author: Jerry Chen (jcsalterego) Date: 2009-07-07 06:13
Applied collinwinter's patch and svn up'd to r73872; replaced additional
instances of self->write_func with _Pickler_Write.

Passed test_xpickle.py as referenced by issue 5665.
msg95521 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-11-20 01:27
Are you still willing to work on this?
msg95576 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-11-21 05:39
Last august, I worked on integrating Collin's optimization work into
py3k in a local Mercurial branch. So, I can champion these changes into
py3k, if Collin is unavailable.

And if Collin allows me, I would like to merge the other pickle
optimizations currently in Unladen Swallow.
msg97484 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-10 00:46
Updated the patch against the latest version of cPickle.c (r77393).  All tests pass on my Mac.
msg97514 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-10 15:22
There were a couple of comments on the Rietveld code review above.
msg97515 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-10 15:38
Antoine> There were a couple of comments on the Rietveld code review
    Antoine> above.

Indeed there are.  Given that the Unladen Swallow folks were focusing on the
2.6 branch and their goal was to improve performance I don't see any reason
to not accept what they've done, then tweak it for 2.7/3.1 assuming the
changes you and Alexandre suggested further improve performance.

Skip
msg97516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-10 15:52
> Indeed there are.  Given that the Unladen Swallow folks were focusing on the
> 2.6 branch and their goal was to improve performance I don't see any reason
> to not accept what they've done, then tweak it for 2.7/3.1 assuming the
> changes you and Alexandre suggested further improve performance.

The main thing I'm worried about is the potentially unbounded buffering,
since it could reduce performance (or even thrash the machine) instead
of improving it.
msg97517 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-10 15:57
Antoine> The main thing I'm worried about is the potentially unbounded
    Antoine> buffering, since it could reduce performance (or even thrash
    Antoine> the machine) instead of improving it.

Got a test case in mind?  If so, I'll code it up and compare 2.6 and Unladen
Swallow as well as offer it up for inclusion in the U-S test suite.

S
msg97518 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-10 16:02
> Got a test case in mind?  If so, I'll code it up and compare 2.6 and Unladen
> Swallow as well as offer it up for inclusion in the U-S test suite.

Trying to pickle a structure that's larger than half the RAM should do
the trick. Something like a list of large strings.
msg97520 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-10 16:32
Quick test on a 3GB machine:

Without patch ("top" shows the process reaches 1.2GB RAM max):

$ time ./python -c "import cPickle;l=['a'*1024 for i in xrange(1000000)];cPickle.dump(l, open('/dev/null', 'wb'))"
10.67user 1.47system 0:12.92elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+319042minor)pagefaults 0swaps

With the patch, the same command quickly swaps hopelessly and after 5 minutes of elapsed time I finally manage to kill the process.
msg97561 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-10 23:42
Antoine> With the patch, the same command quickly swaps hopelessly and
    Antoine> after 5 minutes of elapsed time I finally manage to kill the
    Antoine> process.

Verified with an Unladen Swallow test case.  I'll see if I can fix it.

S
msg97704 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-13 10:07
You can fix it if you are dumping to a file, however if you are calling dumps() you are kind of screwed if dumping large objects.  There's no place to flush the buffer.

I have a fix to Unladen Swallow's cPickle module.  I'm run it by them before submitting it here.
msg97705 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-13 10:13
Do we need an intermediate buffer at all when called from dumps()? How about allocating the buffer as a PyStringObject, so that it can be used directly for the result in that case?
(IIRC there's a handy _PyString_Resize function)
msg97711 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-13 12:48
Perhaps.  Let's take it one step at a time though.  If I change your
large pickle example to use dumps() instead of dump() in an unsullied
Python2.5 I get a MemoryError.  In general, I think you have to be
careful using dumps().  Any attempt to solve that problem will likely
be independent of the Unladen Swallow patches.
msg97712 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-13 12:52
Oh, BTW, the proposed fix is in Rietveld: http://codereview.appspot.com/189051
msg112957 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010-08-05 08:17
Too late to make this change in 2.x. And the patch in issue 9410 includes the optimization for 3.x.
History
Date User Action Args
2010-08-05 08:17:01alexandre.vassalottisetstatus: open -> closed
resolution: duplicate
messages: + msg112957

superseder: Add Unladen Swallow's optimizations to Python 3's pickle.
stage: patch review -> resolved
2010-05-20 20:35:34skip.montanarosetnosy: - skip.montanaro
2010-01-13 12:52:16skip.montanarosetmessages: + msg97712
2010-01-13 12:48:39skip.montanarosetmessages: + msg97711
2010-01-13 10:13:14pitrousetmessages: + msg97705
2010-01-13 10:07:58skip.montanarosetmessages: + msg97704
2010-01-10 23:42:15skip.montanarosetmessages: + msg97561
2010-01-10 16:32:04pitrousetmessages: + msg97520
2010-01-10 16:02:55pitrousetmessages: + msg97518
2010-01-10 15:57:46skip.montanarosetmessages: + msg97517
2010-01-10 15:52:23pitrousetmessages: + msg97516
2010-01-10 15:38:36skip.montanarosetmessages: + msg97515
2010-01-10 15:22:57pitrousetmessages: + msg97514
2010-01-10 00:46:18skip.montanarosetfiles: + cPickle.-r77393.patch
nosy: + skip.montanaro
messages: + msg97484

2009-11-21 05:39:07alexandre.vassalottisetmessages: + msg95576
2009-11-20 01:27:08pitrousetmessages: + msg95521
versions: + Python 3.2, - Python 3.1
2009-07-07 06:14:00jcsalteregosetfiles: + cpickle_writes-r73872.patch
nosy: + jcsalterego
messages: + msg90212

2009-07-06 18:01:04pitrousetmessages: + msg90185
2009-04-04 01:21:15pitrousetnosy: + pitrou
2009-04-04 00:02:17collinwintercreate