classification
Title: pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, loewis, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2011-08-27 12:59 by pitrou, last changed 2012-11-24 19:45 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_portable_size.patch serhiy.storchaka, 2012-11-16 21:23 review
pickle_nonportable_size.patch serhiy.storchaka, 2012-11-17 15:59 review
pickle_nonportable_size_2.patch serhiy.storchaka, 2012-11-24 19:13 review
Messages (16)
msg143065 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-27 12:59
In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's "i" typecode which means "signed"... and therefore fails reading the data.
Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't support unsigned types.

(seen from http://bugs.python.org/issue11564)
msg175010 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-06 20:53
What if just add " & 0xFFFFFFFF"?
msg175011 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-06 20:59
Ah, for unpacking 32-bit unsigned big-endian bytes you can use len = int.from_bytes(self.read(4), 'big').
msg175012 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-06 21:06
Or you can use len = struct.unpack('>I', self.read(4)).
msg175022 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2012-11-06 22:46
pickle.py is the buggy one here. Its use of the marshal module is really a hack. Plus, it is slower than both struct and int.from_bytes.

14:40:57 [~/cpython]$ ./python -m timeit "int.from_bytes(b'\xff\xff\xff\xff', 'big')"
1000000 loops, best of 3: 0.209 usec per loop
14:38:03 [~/cpython]$ ./python -m timeit -s "import struct" "struct.unpack('>I', b'\xff\xff\xff\xff')"
10000000 loops, best of 3: 0.147 usec per loop
14:37:44 [~/cpython]$ ./python -m timeit -s "import marshal" "marshal.loads(b'i'+b'\xff\xff\xff\xff')"
1000000 loops, best of 3: 0.236 usec per loop
msg175653 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-15 22:17
The C implementation writes and reads BINBYTES and BINUNICODE up to 4G (on 64-bit platform). The Python implementation writes and reads BINBYTES and BINUNICODE up to 2G. What should be compatible fix? Allow the Python implementation to write and read up to 4G? Then Python can pickle a large data which can't be unpickled on non-patched Python (and on 2.7). Limit size to 2G? Then non-patched Python (including 3.1) can pickle a data which can't be unpickled on patched Python.

Also there is an unpleasant fact that 64-bit Python can pickle data which can't unpickle 32-bit Python.
msg175715 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-16 21:23
Here is a patch for 3.x. It unify behavior of Python and C implementations and unify behavior on 32- and 64-bit platforms.  For backward compatibility Pickler can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit.
msg175746 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 14:55
> Here is a patch for 3.x. It unify behavior of Python and C
> implementations and unify behavior on 32- and 64-bit platforms.  For 
> backward compatibility Pickler can pickle up to 2G data, but Unpickler 
> can unpickle up to 4G on 64-bit.

I agree the right tradeoff is not easy to find, but I don't think we should introduce a regression in _pickle.c just for the sake of making it more consistent with pickle.py's bugs. So _pickle.c behaviour should probably be preserved, and pickle.py should be improved to accept unpickling 4G pickles.
msg175747 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 14:57
I'd like to add that anyone wanting to serialize large data will certainly be using _pickle (or its ancestor cPickle), since using pickle.py is probably excruciatingly slow. Meaning we should favour preserving _pickle/cPickle's behaviour over preserving pickle.py's behaviour here.
msg175754 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 15:26
The issue is not only in difference between Python and C implementations, but also between 32-bit and 64-bit.

pickle.py on 32-bit accepts data up to 2G.
pickle.py on 64-bit accepts data up to 2G.
_pickle.c on 32-bit accepts data up to 2G.
_pickle.c on 64-bit accepts data up to 4G.

3:1 for 2G.  Current _pickle.c behavior is just not portable.

Of course, I can rewrite the patch, expanding the limit to 4G on 64-bit if you insist.  But I doubt that this is the best variant.
msg175756 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-11-17 15:41
IMO, the right solution is to finish PEP 3154, and support large strings in the format.

For the time being, I'd claim that signed length in the existing implementations are just a bug, and that unsigned lengths are the intended semantics of these opcodes. I can't see anything that is gained by allowing negative lengths.

OTOH, I also think that it won't matter much in practive: if you try to unpickle a string with more than 2GiB on a 32-bit system, chances are really high that you run out of memory. So whether any bug fix needs to be backported, I don't know.
msg175761 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 15:59
Here is a patch for 3.x which extends supported size to 4G on 64-bit.
msg175780 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-17 17:14
> OTOH, I also think that it won't matter much in practive: if you try to 
> unpickle a string with more than 2GiB on a 32-bit system, chances are
> really high that you run out of memory.

Agreed. I think this issue is mostly about 64-bit systems, even though we may want to fix to apply to 32-bit systems as well, if it doesn't make things more complicated.

And, yes, PEP 3154 should be finished, but it is currently stalled in issue 15642.
msg176306 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-24 19:13
Patch updated (comment for load_binstring added).
msg176310 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-24 19:44
New changeset 55fe4b57dd9c by Antoine Pitrou in branch '3.2':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/55fe4b57dd9c

New changeset c9d205e2dd02 by Antoine Pitrou in branch '3.3':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/c9d205e2dd02

New changeset aac6b313ef5f by Antoine Pitrou in branch 'default':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/aac6b313ef5f
msg176311 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-24 19:45
I've committed the latest patch (pickle_nonportable_size_2.patch). Thank you for working on this!
History
Date User Action Args
2012-11-24 19:45:48pitrousetstatus: open -> closed
resolution: fixed
messages: + msg176311

stage: patch review -> resolved
2012-11-24 19:44:42python-devsetnosy: + python-dev
messages: + msg176310
2012-11-24 19:13:19serhiy.storchakasetfiles: + pickle_nonportable_size_2.patch

messages: + msg176306
2012-11-17 17:14:33pitrousetmessages: + msg175780
2012-11-17 15:59:35serhiy.storchakasetfiles: + pickle_nonportable_size.patch

messages: + msg175761
2012-11-17 15:41:26loewissetmessages: + msg175756
2012-11-17 15:26:57serhiy.storchakasetnosy: + loewis
messages: + msg175754
2012-11-17 14:57:41pitrousetmessages: + msg175747
2012-11-17 14:55:25pitrousetmessages: + msg175746
2012-11-16 21:23:59serhiy.storchakasetfiles: + pickle_portable_size.patch
keywords: + patch
messages: + msg175715

stage: patch review
2012-11-15 22:17:16serhiy.storchakasetmessages: + msg175653
2012-11-06 22:46:27alexandre.vassalottisetmessages: + msg175022
2012-11-06 21:06:39serhiy.storchakasetmessages: + msg175012
2012-11-06 20:59:44serhiy.storchakasetmessages: + msg175011
2012-11-06 20:53:48serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg175010
versions: + Python 3.4
2011-08-27 12:59:50pitroucreate