msg143065 - (view) |
Author: Antoine Pitrou (pitrou) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2011-08-27 12:59 |
In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's "i" typecode which means "signed"... and therefore fails reading the data.
Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't support unsigned types.
(seen from http://bugs.python.org/issue11564)
|
msg175010 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-06 20:53 |
What if just add " & 0xFFFFFFFF"?
|
msg175011 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-06 20:59 |
Ah, for unpacking 32-bit unsigned big-endian bytes you can use len = int.from_bytes(self.read(4), 'big').
|
msg175012 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-06 21:06 |
Or you can use len = struct.unpack('>I', self.read(4)).
|
msg175022 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-06 22:46 |
pickle.py is the buggy one here. Its use of the marshal module is really a hack. Plus, it is slower than both struct and int.from_bytes.
14:40:57 [~/cpython]$ ./python -m timeit "int.from_bytes(b'\xff\xff\xff\xff', 'big')"
1000000 loops, best of 3: 0.209 usec per loop
14:38:03 [~/cpython]$ ./python -m timeit -s "import struct" "struct.unpack('>I', b'\xff\xff\xff\xff')"
10000000 loops, best of 3: 0.147 usec per loop
14:37:44 [~/cpython]$ ./python -m timeit -s "import marshal" "marshal.loads(b'i'+b'\xff\xff\xff\xff')"
1000000 loops, best of 3: 0.236 usec per loop
|
msg175653 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-15 22:17 |
The C implementation writes and reads BINBYTES and BINUNICODE up to 4G (on 64-bit platform). The Python implementation writes and reads BINBYTES and BINUNICODE up to 2G. What should be compatible fix? Allow the Python implementation to write and read up to 4G? Then Python can pickle a large data which can't be unpickled on non-patched Python (and on 2.7). Limit size to 2G? Then non-patched Python (including 3.1) can pickle a data which can't be unpickled on patched Python.
Also there is an unpleasant fact that 64-bit Python can pickle data which can't unpickle 32-bit Python.
|
msg175715 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-16 21:23 |
Here is a patch for 3.x. It unify behavior of Python and C implementations and unify behavior on 32- and 64-bit platforms. For backward compatibility Pickler can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit.
|
msg175746 - (view) |
Author: Antoine Pitrou (pitrou) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 14:55 |
> Here is a patch for 3.x. It unify behavior of Python and C
> implementations and unify behavior on 32- and 64-bit platforms. For
> backward compatibility Pickler can pickle up to 2G data, but Unpickler
> can unpickle up to 4G on 64-bit.
I agree the right tradeoff is not easy to find, but I don't think we should introduce a regression in _pickle.c just for the sake of making it more consistent with pickle.py's bugs. So _pickle.c behaviour should probably be preserved, and pickle.py should be improved to accept unpickling 4G pickles.
|
msg175747 - (view) |
Author: Antoine Pitrou (pitrou) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 14:57 |
I'd like to add that anyone wanting to serialize large data will certainly be using _pickle (or its ancestor cPickle), since using pickle.py is probably excruciatingly slow. Meaning we should favour preserving _pickle/cPickle's behaviour over preserving pickle.py's behaviour here.
|
msg175754 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 15:26 |
The issue is not only in difference between Python and C implementations, but also between 32-bit and 64-bit.
pickle.py on 32-bit accepts data up to 2G.
pickle.py on 64-bit accepts data up to 2G.
_pickle.c on 32-bit accepts data up to 2G.
_pickle.c on 64-bit accepts data up to 4G.
3:1 for 2G. Current _pickle.c behavior is just not portable.
Of course, I can rewrite the patch, expanding the limit to 4G on 64-bit if you insist. But I doubt that this is the best variant.
|
msg175756 - (view) |
Author: Martin v. Löwis (loewis) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 15:41 |
IMO, the right solution is to finish PEP 3154, and support large strings in the format.
For the time being, I'd claim that signed length in the existing implementations are just a bug, and that unsigned lengths are the intended semantics of these opcodes. I can't see anything that is gained by allowing negative lengths.
OTOH, I also think that it won't matter much in practive: if you try to unpickle a string with more than 2GiB on a 32-bit system, chances are really high that you run out of memory. So whether any bug fix needs to be backported, I don't know.
|
msg175761 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 15:59 |
Here is a patch for 3.x which extends supported size to 4G on 64-bit.
|
msg175780 - (view) |
Author: Antoine Pitrou (pitrou) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-17 17:14 |
> OTOH, I also think that it won't matter much in practive: if you try to
> unpickle a string with more than 2GiB on a 32-bit system, chances are
> really high that you run out of memory.
Agreed. I think this issue is mostly about 64-bit systems, even though we may want to fix to apply to 32-bit systems as well, if it doesn't make things more complicated.
And, yes, PEP 3154 should be finished, but it is currently stalled in issue 15642.
|
msg176306 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-24 19:13 |
Patch updated (comment for load_binstring added).
|
msg176310 - (view) |
Author: Roundup Robot (python-dev) ![Python triager (Python triager)](@@file/triager.png) |
Date: 2012-11-24 19:44 |
New changeset 55fe4b57dd9c by Antoine Pitrou in branch '3.2':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/55fe4b57dd9c
New changeset c9d205e2dd02 by Antoine Pitrou in branch '3.3':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/c9d205e2dd02
New changeset aac6b313ef5f by Antoine Pitrou in branch 'default':
Issue #12848: The pure Python pickle implementation now treats object lengths as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/aac6b313ef5f
|
msg176311 - (view) |
Author: Antoine Pitrou (pitrou) * ![Python committer (Python committer)](@@file/committer.png) |
Date: 2012-11-24 19:45 |
I've committed the latest patch (pickle_nonportable_size_2.patch). Thank you for working on this!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:21 | admin | set | github: 57057 |
2012-11-24 19:45:48 | pitrou | set | status: open -> closed resolution: fixed messages:
+ msg176311
stage: patch review -> resolved |
2012-11-24 19:44:42 | python-dev | set | nosy:
+ python-dev messages:
+ msg176310
|
2012-11-24 19:13:19 | serhiy.storchaka | set | files:
+ pickle_nonportable_size_2.patch
messages:
+ msg176306 |
2012-11-17 17:14:33 | pitrou | set | messages:
+ msg175780 |
2012-11-17 15:59:35 | serhiy.storchaka | set | files:
+ pickle_nonportable_size.patch
messages:
+ msg175761 |
2012-11-17 15:41:26 | loewis | set | messages:
+ msg175756 |
2012-11-17 15:26:57 | serhiy.storchaka | set | nosy:
+ loewis messages:
+ msg175754
|
2012-11-17 14:57:41 | pitrou | set | messages:
+ msg175747 |
2012-11-17 14:55:25 | pitrou | set | messages:
+ msg175746 |
2012-11-16 21:23:59 | serhiy.storchaka | set | files:
+ pickle_portable_size.patch keywords:
+ patch messages:
+ msg175715
stage: patch review |
2012-11-15 22:17:16 | serhiy.storchaka | set | messages:
+ msg175653 |
2012-11-06 22:46:27 | alexandre.vassalotti | set | messages:
+ msg175022 |
2012-11-06 21:06:39 | serhiy.storchaka | set | messages:
+ msg175012 |
2012-11-06 20:59:44 | serhiy.storchaka | set | messages:
+ msg175011 |
2012-11-06 20:53:48 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
messages:
+ msg175010 versions:
+ Python 3.4 |
2011-08-27 12:59:50 | pitrou | create | |