classification
Title: Speed up pickling of lists in cPickle
Type: performance Stage: resolved
Components: Extension Modules Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Add Unladen Swallow's optimizations to Python 3's pickle.
View: 9410
Assigned To: alexandre.vassalotti Nosy List: Garen, alexandre.vassalotti, collinwinter, jafo, pitrou, rhettinger
Priority: normal Keywords: needs review, patch

Created on 2009-04-02 19:28 by collinwinter, last changed 2010-08-05 07:28 by alexandre.vassalotti. This issue is now closed.

Files
File name Uploaded Description Edit
cpickle_list.patch collinwinter, 2009-04-02 19:28 Patch against trunk, r71058
pickle_batch_list_exact_py3k.diff alexandre.vassalotti, 2009-04-03 05:46
Messages (9)
msg85250 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-04-02 19:28
The attached patch adds another version of cPickle.c's batch_list(),
batch_list_exact(), which is specialized for "type(x) is list". This
provides a nice performance boost when pickling objects that use
lists. This is similar to the approach taken in issue 5670, though the
performance boost on our benchmark is smaller:

Pickle:
Min: 2.231 -> 2.200: 1.39% faster
Avg: 2.266 -> 2.227: 1.72% faster
Significant (t=10.994064, a=0.95)


Benchmark is at
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/macro_pickle.py
(driver is  ../perf.py; perf.py was run with "--rigorous -b pickle").
Workloads involving more lists will benefit more.

This patch passes all the tests added in issue 5665. I would recommend
reviewing that patch first. I'll port to py3k once this is reviewed for
trunk.
msg85263 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-04-02 20:29
Out of curiousity, are you also benchmarking against marshal and json? 
ISTM, that there is always one of them that will be the fastest and that
the others should mimic that approach.
msg85265 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-04-02 20:35
No, we haven't started looking at other serialization formats yet.
Marshal will probably be my next target, with json being a lower
priority. There were enough instances of low-hanging fruit in cPickle
that I didn't go looking at the other implementations for ideas.
msg85295 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-04-03 05:46
Here's a patch for py3k. I also added a special-case for 1-item lists.
msg85301 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-04-03 09:32
A micro-benchmark of Collin's patch:

python -m timeit -s "import cPickle; l=range(150)" "cPickle.dumps(l,
protocol=-1)"

* before: 12.1 usec per loop
* after: 10.1 usec per loop

=> 15% faster on a favorable case
msg85312 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-04-03 17:56
I've added a microbenchmark to perf.py called pickle_list. Running that
on this change (perf.py -r -b pickle_list):

pickle_list:
Min: 1.126 -> 0.888: 26.86% faster
Avg: 1.154 -> 0.906: 27.43% faster
Significant (t=115.404547, a=0.95)

That's probably the upper bound on the performance improvement to be
realized from this patch.
msg97492 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-01-10 02:36
Still applies cleanly (with a little fuzz) to the trunk after applying
the issue 5683 patch.  Tests all still pass (including xpickle w/ 2.4,
2.5, 2.6 available).
msg101461 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2010-03-21 21:31
pickle_batch_list_exact_py3k.diff applies cleanly on current py3k trunk and passes tests.  cpickle_list.patch applies cleanly against 2.x trunk and passes "make test".

I don't see any objections brought up about this set of patches, so we can get these applied?
msg112953 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2010-08-05 07:28
It is too late now for the 2.x version. And, the huge patch in issue 9410 includes an updated version of this patch for 3.x.
History
Date User Action Args
2010-08-05 07:28:00alexandre.vassalottisetstatus: open -> closed
resolution: duplicate
messages: + msg112953

superseder: Add Unladen Swallow's optimizations to Python 3's pickle.
stage: resolved
2010-05-20 20:37:58skip.montanarosetnosy: - skip.montanaro
2010-04-15 22:34:48Garensetnosy: + Garen
2010-03-21 21:31:17jafosetpriority: normal

nosy: + jafo
messages: + msg101461

assignee: alexandre.vassalotti
2010-01-10 02:36:11skip.montanarosetnosy: + skip.montanaro
messages: + msg97492
2009-04-03 17:56:08collinwintersetmessages: + msg85312
2009-04-03 09:32:03pitrousetnosy: + pitrou
messages: + msg85301
2009-04-03 05:47:18alexandre.vassalottisetversions: + Python 3.1
2009-04-03 05:46:59alexandre.vassalottisetfiles: + pickle_batch_list_exact_py3k.diff
nosy: + alexandre.vassalotti
messages: + msg85295

2009-04-02 20:35:30collinwintersetmessages: + msg85265
2009-04-02 20:29:27rhettingersetnosy: + rhettinger
messages: + msg85263
2009-04-02 19:28:16collinwintercreate