msg85349 - (view) |
Author: Collin Winter (collinwinter) * |
Date: 2009-04-04 00:02 |
This patch simplifies cPickle's complicated internal buffering system.
The new version uses a single buffer closer to the Pickler object,
flushing to a file object only when necessary. This avoids the overhead
of several indirection layers for what are frequently very small writes.
Benchmarked against virgin trunk r71058 using "perf.py -r -b
pickle,pickle_list,pickle_dict":
pickle:
Min: 2.225 -> 1.962: 13.37% faster
Avg: 2.254 -> 1.994: 13.03% faster
Significant (t=70.763434, a=0.95)
pickle_dict:
Min: 2.097 -> 1.418: 47.83% faster
Avg: 2.136 -> 1.446: 47.75% faster
Significant (t=214.900146, a=0.95)
pickle_list:
Min: 1.128 -> 0.777: 45.22% faster
Avg: 1.152 -> 0.807: 42.65% faster
Significant (t=169.789433, a=0.95)
A similar patch for unpickling will follow. As issue 5670 and 5671 are
committed, I'll update this patch to support them.
This patch passes all the tests added in issue 5665. I would recommend
reviewing that patch first. I'll port to py3k once this is reviewed for
trunk. This patch is available on Rietveld at
http://codereview.appspot.com/33070.
|
msg90185 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-07-06 18:01 |
Any updates?
|
msg90212 - (view) |
Author: Jerry Chen (jcsalterego) |
Date: 2009-07-07 06:13 |
Applied collinwinter's patch and svn up'd to r73872; replaced additional
instances of self->write_func with _Pickler_Write.
Passed test_xpickle.py as referenced by issue 5665.
|
msg95521 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-20 01:27 |
Are you still willing to work on this?
|
msg95576 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * |
Date: 2009-11-21 05:39 |
Last august, I worked on integrating Collin's optimization work into
py3k in a local Mercurial branch. So, I can champion these changes into
py3k, if Collin is unavailable.
And if Collin allows me, I would like to merge the other pickle
optimizations currently in Unladen Swallow.
|
msg97484 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-10 00:46 |
Updated the patch against the latest version of cPickle.c (r77393). All tests pass on my Mac.
|
msg97514 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-01-10 15:22 |
There were a couple of comments on the Rietveld code review above.
|
msg97515 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-10 15:38 |
Antoine> There were a couple of comments on the Rietveld code review
Antoine> above.
Indeed there are. Given that the Unladen Swallow folks were focusing on the
2.6 branch and their goal was to improve performance I don't see any reason
to not accept what they've done, then tweak it for 2.7/3.1 assuming the
changes you and Alexandre suggested further improve performance.
Skip
|
msg97516 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-01-10 15:52 |
> Indeed there are. Given that the Unladen Swallow folks were focusing on the
> 2.6 branch and their goal was to improve performance I don't see any reason
> to not accept what they've done, then tweak it for 2.7/3.1 assuming the
> changes you and Alexandre suggested further improve performance.
The main thing I'm worried about is the potentially unbounded buffering,
since it could reduce performance (or even thrash the machine) instead
of improving it.
|
msg97517 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-10 15:57 |
Antoine> The main thing I'm worried about is the potentially unbounded
Antoine> buffering, since it could reduce performance (or even thrash
Antoine> the machine) instead of improving it.
Got a test case in mind? If so, I'll code it up and compare 2.6 and Unladen
Swallow as well as offer it up for inclusion in the U-S test suite.
S
|
msg97518 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-01-10 16:02 |
> Got a test case in mind? If so, I'll code it up and compare 2.6 and Unladen
> Swallow as well as offer it up for inclusion in the U-S test suite.
Trying to pickle a structure that's larger than half the RAM should do
the trick. Something like a list of large strings.
|
msg97520 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-01-10 16:32 |
Quick test on a 3GB machine:
Without patch ("top" shows the process reaches 1.2GB RAM max):
$ time ./python -c "import cPickle;l=['a'*1024 for i in xrange(1000000)];cPickle.dump(l, open('/dev/null', 'wb'))"
10.67user 1.47system 0:12.92elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+319042minor)pagefaults 0swaps
With the patch, the same command quickly swaps hopelessly and after 5 minutes of elapsed time I finally manage to kill the process.
|
msg97561 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-10 23:42 |
Antoine> With the patch, the same command quickly swaps hopelessly and
Antoine> after 5 minutes of elapsed time I finally manage to kill the
Antoine> process.
Verified with an Unladen Swallow test case. I'll see if I can fix it.
S
|
msg97704 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-13 10:07 |
You can fix it if you are dumping to a file, however if you are calling dumps() you are kind of screwed if dumping large objects. There's no place to flush the buffer.
I have a fix to Unladen Swallow's cPickle module. I'm run it by them before submitting it here.
|
msg97705 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-01-13 10:13 |
Do we need an intermediate buffer at all when called from dumps()? How about allocating the buffer as a PyStringObject, so that it can be used directly for the result in that case?
(IIRC there's a handy _PyString_Resize function)
|
msg97711 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-13 12:48 |
Perhaps. Let's take it one step at a time though. If I change your
large pickle example to use dumps() instead of dump() in an unsullied
Python2.5 I get a MemoryError. In general, I think you have to be
careful using dumps(). Any attempt to solve that problem will likely
be independent of the Unladen Swallow patches.
|
msg97712 - (view) |
Author: Skip Montanaro (skip.montanaro) * |
Date: 2010-01-13 12:52 |
Oh, BTW, the proposed fix is in Rietveld: http://codereview.appspot.com/189051
|
msg112957 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * |
Date: 2010-08-05 08:17 |
Too late to make this change in 2.x. And the patch in issue 9410 includes the optimization for 3.x.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:47 | admin | set | github: 49933 |
2010-08-05 08:17:01 | alexandre.vassalotti | set | status: open -> closed resolution: duplicate messages:
+ msg112957
superseder: Add Unladen Swallow's optimizations to Python 3's pickle. stage: patch review -> resolved |
2010-05-20 20:35:34 | skip.montanaro | set | nosy:
- skip.montanaro
|
2010-01-13 12:52:16 | skip.montanaro | set | messages:
+ msg97712 |
2010-01-13 12:48:39 | skip.montanaro | set | messages:
+ msg97711 |
2010-01-13 10:13:14 | pitrou | set | messages:
+ msg97705 |
2010-01-13 10:07:58 | skip.montanaro | set | messages:
+ msg97704 |
2010-01-10 23:42:15 | skip.montanaro | set | messages:
+ msg97561 |
2010-01-10 16:32:04 | pitrou | set | messages:
+ msg97520 |
2010-01-10 16:02:55 | pitrou | set | messages:
+ msg97518 |
2010-01-10 15:57:46 | skip.montanaro | set | messages:
+ msg97517 |
2010-01-10 15:52:23 | pitrou | set | messages:
+ msg97516 |
2010-01-10 15:38:36 | skip.montanaro | set | messages:
+ msg97515 |
2010-01-10 15:22:57 | pitrou | set | messages:
+ msg97514 |
2010-01-10 00:46:18 | skip.montanaro | set | files:
+ cPickle.-r77393.patch nosy:
+ skip.montanaro messages:
+ msg97484
|
2009-11-21 05:39:07 | alexandre.vassalotti | set | messages:
+ msg95576 |
2009-11-20 01:27:08 | pitrou | set | messages:
+ msg95521 versions:
+ Python 3.2, - Python 3.1 |
2009-07-07 06:14:00 | jcsalterego | set | files:
+ cpickle_writes-r73872.patch nosy:
+ jcsalterego messages:
+ msg90212
|
2009-07-06 18:01:04 | pitrou | set | messages:
+ msg90185 |
2009-04-04 01:21:15 | pitrou | set | nosy:
+ pitrou
|
2009-04-04 00:02:17 | collinwinter | create | |