Unpickling is really slow #48123

hagen · 2008-09-15T16:37:16Z

BPO	3873
Nosy	@jcea, @amauryfa, @abalkin, @pitrou, @vstinner, @avassalotti
Files	pickletst.py pickle_read_buffer.patch pickletest.py unpickleprefetch.patch unpickleprefetch2.patch bench_pickle.py unpickleprefetch3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-10-12.20:52:02.056>
created_at = <Date 2008-09-15.16:37:16.466>
labels = ['library', 'performance']
title = 'Unpickling is really slow'
updated_at = <Date 2011-03-19.03:00:12.650>
user = 'https://bugs.python.org/hagen'

bugs.python.org fields:

activity = <Date 2011-03-19.03:00:12.650>
actor = 'jcea'
assignee = 'none'
closed = True
closed_date = <Date 2010-10-12.20:52:02.056>
closer = 'pitrou'
components = ['Library (Lib)']
creation = <Date 2008-09-15.16:37:16.466>
creator = 'hagen'
dependencies = []
files = ['12785', '13632', '13633', '18984', '19000', '19033', '19034']
hgrepos = []
issue_num = 3873
keywords = ['patch']
message_count = 38.0
messages = ['73267', '73269', '73271', '73273', '73275', '74487', '76988', '80093', '80097', '80101', '80103', '80104', '85458', '85459', '85462', '85636', '85637', '85638', '85639', '85640', '85780', '86047', '110687', '111867', '111870', '111871', '111892', '111964', '117239', '117241', '117320', '117323', '117471', '117472', '117480', '117481', '117483', '118472']
nosy_count = 9.0
nosy_names = ['collinwinter', 'jcea', 'amaury.forgeotdarc', 'belopolsky', 'pitrou', 'vstinner', 'alexandre.vassalotti', 'hagen', 'BreamoreBoy']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue3873'
versions = ['Python 3.2']

hagen · 2008-09-15T16:37:16Z

Unpickling e.g. a large list seems to be really slow in Python 3.0.

The attached test script gives the following results for pickling and
unpickling a list of 1M zeros, showing that although the C
implementation seems to be used in Python 3.0, unpickling is even slower
than with the "pickle" module of Python 2.6!

[hagenf@cluster-06 hagenf]$ python pickletst.py
2.71067500114
2.77484893799
[hagenf@cluster-06 hagenf]$ python3.0 pickletst.py
0.0925059318542
5.76121616364

pitrou · 2008-09-15T16:56:46Z

Do the numbers vary if you read the whole file at once and then unpickle
the resulting bytes string?
Large parts of the IO library are written in Python in 3.0, which might
explain the discrepancy.

hagen · 2008-09-15T17:17:24Z

Yes, it gets much better, but even so (first reading file and timing
only "loads") unpickling takes four times as long in Python 3.0 as with
the old cPickle module:

[hagenf@cluster-06 hagenf]$ python pickletst2.py
0.0744678974152
0.0514161586761
[hagenf@cluster-06 hagenf]$ python3.0 pickletst3.py
0.0911619663239
0.208593845367

But I guess this can still be blamed on the BytesIO implementation...

amauryfa · 2008-09-15T17:24:06Z

Indeed. If I replace the file with
f = io.BytesIO(open("tst", "rb").read())
timings are divided by 20...

After quick profiling, it seems that PyLong_New would benefit from a
free list. len(bytearray) is called very often.
To stay simple, it would be enough to only store longs of length 1 (<2**15).

pitrou · 2008-09-15T17:37:29Z

Gregory had patches for a freelist of long objects in bpo-2013.

avassalotti · 2008-10-07T21:12:43Z

The solution is to add a read buffer to Unpickler (Pickler already has a
write buffer, so that why it is unaffected). I believe this would
mitigate much of the (quite large) Python function call overhead.

cPickle has a performance hack to make it uses cStringIO and PyFile
directly (via C function calls). In Python 3, the hack was removed since
cStringIO.h and fileobject.h aren't public anymore. This explains the
0.02s deviation you are getting for dump speed.

hagen · 2008-12-05T09:21:17Z

I think a read buffer is not possible without being able to unread bytes
from the stream. pickle shoudn't consume bytes after the end of a
pickled object!

pitrou · 2009-01-18T12:13:00Z

Making this a duplicate of bpo-4565 ("Rewrite the IO stack in C").
If anyone disagrees, please reopen!

hagen · 2009-01-18T13:40:35Z

With the io-c branch I see much better unpickling performance than
before. But it still seems to be around 2 or 3 times slower than with
cPickle in 2.6.

Is this expected at this point of io-c development? Otherwise perhaps
this issue should stay open until it can be verified that nothing else
can be done to get closer to the old cPickle performance.

pitrou · 2009-01-18T14:01:19Z

Hello,

With the io-c branch I see much better unpickling performance than
before. But it still seems to be around 2 or 3 times slower than with
cPickle in 2.6.

It's much closer here.

With 2.7 (trunk) and cPickle:
0.439934968948
0.669679880142
With the io-c branch:
0.136367082596
0.798221111298

Is this expected at this point of io-c development?

20-30% slower is expected for binary I/O (which is what I get above),
not 2x or 3x slower.
What is your system?

hagen · 2009-01-18T14:17:54Z

I uploaded a new pickletst.py which specifies protocol 2, otherwise
we're comparing apples with oranges. With this I get:

0.211881160736
0.322369813919

for Python 2.6 and

0.158488035202
1.21621990204

on the io-c branch. Can you confirm this?

pitrou · 2009-01-18T14:25:43Z

Nice catch! I can confirm your figures with protocol=2 (and protocol=-1
as well).

vstinner · 2009-04-05T01:23:08Z

alexandre.vassalotti wrote:

The solution is to add a read buffer to Unpickler (...)
would mitigate much of the (quite large) Python function
call overhead. (...) cPickle has a performance hack to make it
uses cStringIO and PyFile directly (via C function calls). In
Python 3, the hack was removed (...)

Yes, unpickler_read() calls Buffered_read() through
PyObject_Call+PyCFunction_Call which is expensive. And unpickle main
loop starts with unpickler_read(self, &s, 1): just read *one* byte
(the "opcode"). If Buffered_read() call is expensive, a solution is to
avoid calling it (eg. read more bytes and... go backward at the end?).

vstinner · 2009-04-05T01:25:20Z

Unladen Swallow has a project to optimize pickle. Currently, it uses 3
benchmarks:
pickle - use the cPickle module to pickle a variety of datasets.
pickle_dict - microbenchmark; use the cPickle module to pickle a lot
of dicts.
pickle_list - microbenchmark; use the cPickle module to pickle a lot
of lists.

vstinner · 2009-04-05T02:02:12Z

gprof (--enable-profiler) results:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.18 0.16 0.16 2011055 0.00 0.00 vgetargs1
7.95 0.23 0.07 2017033 0.00 0.00
PyObject_GenericGetAttr
6.82 0.29 0.06 2002006 0.00 0.00
_BufferedReader_read_unlocked
5.68 0.34 0.05 2012123 0.00 0.00 convertsimple
4.55 0.38 0.04 2013284 0.00 0.00 PyCFunction_Call
4.55 0.42 0.04 2004543 0.00 0.00 PyLong_AsSsize_t
3.41 0.45 0.03 2004565 0.00 0.00 PyNumber_Index
3.41 0.48 0.03 2002177 0.00 0.00 float_argument_error
3.41 0.51 0.03 1000497 0.00 0.00
PyLong_AsLongAndOverflow
2.27 0.53 0.02 2012123 0.00 0.00 convertitem

It looks like PyArg_ParseTuple() should be optimized ;-)

vstinner · 2009-04-06T11:52:22Z

Create a read buffer (4096 bytes) in unpickler class. Using [0]*10**7 or
[1000]*10**7, load() is from 6 to 8 times faster.

I removed last_string attribute because it wasn't used.

If there are "tail" bytes, seek backward.

vstinner · 2009-04-06T11:53:40Z

I don't know why, but python-trunk is *much* slower than py3k (eg. with
dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).

vstinner · 2009-04-06T11:56:48Z

My version of pickletest.py:

make sure that file position is correct after the load()
some benchmark. most interesting numbers:

without the patch :

version | data | dump ms | load ms |
py3k | 0, 10^6 | 230 | 1500 |
py3k | 1000, 10^6 | 255 | 1780 |
py3k | 0, 10^7 | 2360 | 16044 |
py3k | 1000, 10^7 | 2615 | 19380 |

with the patch:

version | data | dump ms | load ms | speed up:
py3k | 0, 10^6 | 237 | 183 | x8
py3k | 1000, 10^6 | 241 | 248 | x7
py3k | 0, 10^7 | 2420 | 1860 | x8
py3k | 1000, 10^7 | 2850 | 3100 | x6

(data: 0, 10^6 means: [0]*10^6)

vstinner · 2009-04-06T12:02:56Z

Note about my patch: the buffer should be truncated after
PyBytes_Concat(&self->buffer.pybytes, data) to avoid very long buffer.
Something like: self->buffer.pybytes += data; self->buffer.pybytes =
self->buffer.pybytes[index:]; self->buffer.index=0;

pitrou · 2009-04-06T12:09:02Z

I don't know why, but python-trunk is *much* slower than py3k (eg. with
dump: 1000 ms vs 24 ms for py3k, or with load: 1500ms vs 186ms).

Perhaps you tried with the pure Python version (pickle) rather than the
C one (cPickle)?

avassalotti · 2009-04-08T17:28:04Z

Victor, Unpickler shouldn't raise an error if the given file object does
support seek(); it should gracefully fall back to using only read() and
readline(). Also, I think you could get a greater performance
improvement by using peek() and avoiding backward seeks. As far as I
know, a backward seek may invalidate the internal buffer of the file
object, thus forcing a read from disk.

Incidentally, I think it would be nice to add to Python 3 some kind of
file protocol as this would provide a way to mitigate the function call
overhead for I/O operations.

pitrou · 2009-04-16T21:34:41Z

By the way, the patch won't work with unseekable files, which is
probably bad.

BreamoreBoy · 2010-07-18T21:27:28Z

Has this slipped under the radar? I believe that one way or the other any performance issue should be resolved if at all possible.

vstinner · 2010-07-28T22:58:02Z

New version of my patch:

add "used" attribute to UnpicklerBuffer structure: disable the read buffer for not seekable file and for protocol 0 (at the first call to unpickle_readline)
check if PyObject_GetAttrString(file, "seek") is NULL or not
unpickle_readline() flushs also the buffer
add a new patch specific to the read buffer: ensure that unpickler doesn't eat data at the end of the file

test_pickle pass without any error.

Disable read buffer at the first call to unpickle_readline() because unpickle_readline() have to flush the buffer. I will be very difficult to optimize protocol 0, but I hope that nobody uses it nowadays.

===========

Benchmark with [0]*10**6, Python compiled with pydebug.

Without the patch
-----------------

Protocol 0:

dump: 598.0 ms
load (seekable=False): 3337.3 ms
load (seekable=True): 3309.6 ms

Protocol 1:

dump: 217.8 ms
load (seekable=False): 864.2 ms
load (seekable=True): 873.3 ms

Protocol 2:

dump: 226.5 ms
load (seekable=False): 867.8 ms
load (seekable=True): 854.6 ms

With the patch
--------------

Protocol 0

dump: 615.5 ms
load (seekable=False): 3201.3 ms
load (seekable=True): 3223.4 ms

Protocol 1

dump: 219.8 ms
load (seekable=False): 942.1 ms
load (seekable=True): 175.2 ms

Protocol 2

dump: 221.1 ms
load (seekable=False): 943.9 ms
load (seekable=True): 175.5 ms

vstinner · 2010-07-28T23:10:29Z

Same benchmark with Python 2.6.5+, so without the patch, but compiled with maximum compiler optimization (whereas pydebug means no optimization):

Protocol 0

dump: 517.3 ms
load: 876.6 ms <= because of the new I/O library, python3 is much slower

Protocol 1

dump: 141.8 ms
load: 255.2 ms <= with my patch, python3 is 1.5x faster

Protocol 2

dump: 142.7 ms
load: 262.1 ms <= with my patch, python3 is 1.5x faster

It would be better to redo all tests with the same compiler options, but the most important point is that Python3 is *faster* than Python2 with my patch ;-)

vstinner · 2010-07-28T23:11:19Z

bench_pickle.py: script used to produce last benchmarks.

avassalotti · 2010-07-29T05:48:16Z

Victor, have you tried using peek() instead of seek()? I mentioned this previously in msg85780.

vstinner · 2010-07-29T16:38:51Z

Victor, have you tried using peek() instead of seek()?
I mentioned this previously in msg85780.

In a file encoded in protocol 0, backward seek are needed to each call to unpickler_readline... and this function is called to read a number, a boolean, etc. (to read most, or all, data values). I choosed to disable the read buffer because it's slower with it.

For protocol 1 and 2, there is only *one* seek at the end (but a lot of read: file size / 4096). So I don't think that it does really matter to use peek or seek.

seek() is more natural (for me) and frequent than peek(), so I prefer to keep seek().

pitrou · 2010-09-23T22:10:23Z

Here is a fixed version of Victor's bench (didn't work on 2.x).

pitrou · 2010-09-23T22:17:42Z

And here is new performance patch (Victor's patch was outdated because of heavy changes incorporated from Unladen Swallow). Results of bench_pickle.py are as follows:

Python 2.7 (cPickle):

Protocol 0

dump: 189.8 ms
load (seekable=False): 318.1 ms
load (seekable=True): 321.2 ms
Protocol 1
dump: 58.2 ms
load (seekable=False): 89.5 ms
load (seekable=True): 89.4 ms
Protocol 2
dump: 58.0 ms
load (seekable=False): 89.1 ms
load (seekable=True): 89.1 ms

Python 3.2 (unpatched):

Protocol 0

dump: 144.5 ms
load (seekable=False): 530.7 ms
load (seekable=True): 530.5 ms
Protocol 1
dump: 27.8 ms
load (seekable=False): 280.4 ms
load (seekable=True): 279.9 ms
Protocol 2
dump: 27.8 ms
load (seekable=False): 280.6 ms
load (seekable=True): 281.9 ms

Python 3.2 + patch:

Protocol 0

dump: 146.4 ms
load (seekable=False): 611.1 ms
load (seekable=True): 194.2 ms
Protocol 1
dump: 27.3 ms
load (seekable=False): 254.7 ms
load (seekable=True): 31.2 ms
Protocol 2
dump: 27.3 ms
load (seekable=False): 254.6 ms
load (seekable=True): 31.2 ms

avassalotti · 2010-09-24T18:12:45Z

I get this error with the patch:

python: /home/alex/src/python.org/py3k/Modules/_pickle.c:908: _Unpickler_ReadFromFile: Assertion `self->next_read_idx == 0' failed.
Aborted

pitrou · 2010-09-24T18:24:58Z

Ah, thank you. I hadn't tested in debug mode and there was a wrong assert from the previous code.
Here is a patch with the assert removed.

pitrou · 2010-09-27T20:52:14Z

One problem with the seek() approach is that some file-like objects have expensive seeks. One example is GzipFile, where seek(n) is O(n) (it first rewinds to the start of file, then reads n decompressed bytes). In the end, unpickling from a GzipFile becomes O(n**2).

I will try to use peek() instead.

avassalotti · 2010-09-27T20:58:06Z

Didn't Victor say that only one seek at the end is necessary per
pickle? If this is the case, I don't think expensive seeks will be an
issue.

pitrou · 2010-09-27T22:06:35Z

Didn't Victor say that only one seek at the end is necessary per
pickle? If this is the case, I don't think expensive seeks will be an
issue.

If you are unpickling from a multi-megabyte gzip file and the seek at
the end makes you uncompress all the gzip file again, it can certainly
be a problem.

Another issue with seeking only at the end is that it would make
readline() quite more complicated. And, apparently, readline() is not
only used on protocol 0 but also on binary protocols.

pitrou · 2010-09-27T22:11:36Z

Here is an update bench_pickle which also makes the file unpeekable.

pitrou · 2010-09-27T22:23:12Z

Here is a patch using peek() rather than seek(). There are some inefficiencies around (such as using read() to skip the consumed prefetched bytes), but the benchmark results are still as good as with seek():

Protocol 0

dump: 142.5 ms
load (seekable=False): 598.8 ms
load (seekable=True): 240.2 ms

Protocol 1

dump: 28.9 ms
load (seekable=False): 253.2 ms
load (seekable=True): 25.7 ms

Protocol 2

dump: 28.9 ms
load (seekable=False): 253.3 ms
load (seekable=True): 25.7 ms

pitrou · 2010-10-12T20:52:02Z

Patch committed in r85384.

hagen mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Sep 15, 2008

pitrou closed this as completed Jan 18, 2009

pitrou reopened this Jan 18, 2009

pitrou closed this as completed Oct 12, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Unpickling is really slow #48123

Unpickling is really slow #48123

Comments

hagen mannequin commented Sep 15, 2008

hagen mannequin commented Sep 15, 2008

pitrou commented Sep 15, 2008

hagen mannequin commented Sep 15, 2008

amauryfa commented Sep 15, 2008

pitrou commented Sep 15, 2008

avassalotti commented Oct 7, 2008

hagen mannequin commented Dec 5, 2008

pitrou commented Jan 18, 2009

hagen mannequin commented Jan 18, 2009

pitrou commented Jan 18, 2009

hagen mannequin commented Jan 18, 2009

pitrou commented Jan 18, 2009

vstinner commented Apr 5, 2009

vstinner commented Apr 5, 2009

vstinner commented Apr 5, 2009

vstinner commented Apr 6, 2009

vstinner commented Apr 6, 2009

vstinner commented Apr 6, 2009

vstinner commented Apr 6, 2009

pitrou commented Apr 6, 2009

avassalotti commented Apr 8, 2009

pitrou commented Apr 16, 2009

BreamoreBoy mannequin commented Jul 18, 2010

vstinner commented Jul 28, 2010

vstinner commented Jul 28, 2010

vstinner commented Jul 28, 2010

avassalotti commented Jul 29, 2010

vstinner commented Jul 29, 2010

pitrou commented Sep 23, 2010

pitrou commented Sep 23, 2010

avassalotti commented Sep 24, 2010

pitrou commented Sep 24, 2010

pitrou commented Sep 27, 2010

avassalotti commented Sep 27, 2010

pitrou commented Sep 27, 2010

pitrou commented Sep 27, 2010

pitrou commented Sep 27, 2010

pitrou commented Oct 12, 2010