msg149092 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2011-12-09 13:25 |
If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string. This makes python 2 reject the pickle.
#########################################
Python 3.3.0a0 (default, Dec 8 2011, 17:56:13) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, array
>>> pickle.dumps(array.array('i', [1,2,3]), 2)
b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.'
#########################################
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Python27\lib\pickle.py", line 1382, in loads
return Unpickler(file).load()
File "c:\Python27\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\Python27\lib\pickle.py", line 1133, in load_reduce
value = func(*args)
TypeError: must be char, not unicode
|
msg149101 - (view) |
Author: Ramchandra Apte (Ramchandra Apte) * |
Date: 2011-12-09 14:23 |
The problem is that pickle is calling array.array(u'i',[1,2,3]) and array.array in Python 2 doesn't allow unicode strings as a typecode (typecode is the first argument)
The docs in Python 2 and Py3k doesn't specify the type of the typecode argument of array.array.
In Python 2 it seems that typecode has to be a bytes string.
In Python 3 it seems that typecode has to be a unicode string.
I suggest that array.array be changed in Python 2 to allow unicode strings as a typecode or that pickle detects array.array being called and fixes the call.
|
msg149104 - (view) |
Author: Richard Oudkerk (sbt) * |
Date: 2011-12-09 15:27 |
> I suggest that array.array be changed in Python 2 to allow unicode strings
> as a typecode or that pickle detects array.array being called and fixes
> the call.
Interestingly, py3 does understand arrays pickled by py2. This appears to be because py2 pickles str using BINSTRING or SHORT_BINSTRING which will unpickle as str on py2 and py3. py3 pickles str using BINUNICODE which will unpickle as unicode on py2 and str on py3.
I think it would be better to fix this in py3 if possible, but that does not look easy: modifying array.__reduce_ex__ alone would not be enough.
The only thing I can think of is for py3 to grow a "_binstr" type which only supports ascii strings and is special-cased by pickle to be pickled using BINSTRING. Then array.__reduce_ex__ could be something like:
def __reduce_ex__(self, protocol):
if protocol <= 2:
return array.array, (_binstr(self.typecode), list(self))
else:
...
|
msg205445 - (view) |
Author: Alexandre Vassalotti (alexandre.vassalotti) * |
Date: 2013-12-07 10:17 |
Adding a special type is not a bad idea. We have to keep the code for loading BINSTRING opcodes anyway, so we might as well use it. It could be helpful for unit-testing our Python 2 compatibility support for pickle.
We should still fix array in 2.7 to accept unicode object for the typecode though.
|
msg206504 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-12-18 09:51 |
Alexandre Vassalotti said: "We should still fix array in 2.7 to accept unicode object for the typecode though."
I created issue #20014 (with the patch) for this feature.
|
msg206532 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-18 16:02 |
See issue20015 for more general approach.
|
msg206536 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-12-18 16:08 |
> If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string. This makes python 2 reject the pickle.
Pickles files of Python 3 are supposed to be compatible with Python 2?
It looks very tricky to produce pickle files compatible with both versions.
|
msg242949 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-05-12 09:17 |
Proposed patch pickles all ascii strings with protocols < 3 and fix_import=True with compatible opcodes (STRING, BINSTRING and SHORT_BINSTRING). Pickled strings are unpickled as str in Python 2 and Python 3 (unless encoding="bytes").
As a side effect, short ascii strings (length < 256) are pickled more compact with protocols < 3.
|
msg245048 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-06-09 08:05 |
Alexandre, Antoine, what are your thoughts?
|
msg245049 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-06-09 08:23 |
Won't that fail if a Python 2 API accepts only unicode strings?
|
msg245050 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-06-09 08:32 |
Does such API even exist?
|
msg245051 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-06-09 08:38 |
I wouldn't be very surprised if third-party libraries enforce such typing, yes. If your library has a clear text/bytes separation, it makes sense to enforce it at the API level, to avoid mistakes by users.
|
msg245056 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-06-09 10:07 |
Such libraries already have a problem. Both str and unicode pickled in Python 2 are unpickled as str in Python 3.
|
msg245057 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-06-09 10:08 |
It's not a problem, since str *is* unicode in Python 3.
|
msg288158 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-02-19 19:28 |
This is a problem when pickle data in Python 3 for unpickling in Python 2.
|
msg289119 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2017-03-06 16:02 |
Right, but Antoine's objection is that suddenly strs pickled in Py3 can end up as strs in Py2, rather than unicode. If the library enforces a Py3-like type separation on Py2 (text arguments are unicode only, binary data is str only), then you have the problem where pickling on Py3 produces a pickle that will unpickle as str on Py2, and suddenly the library explodes because the argument, that should be unicode on Py2 and str on Py3, is suddenly str on both.
This means that, to fix a problem with non-forward compatible libraries (that accept text only as Py2 str), a Py2 library that's (very) forward thinking would have problems.
Admittedly, I wouldn't expect there to be very many such libraries, and many of them would have their own custom pickle formats, but stuff like numpy is quite sensitive to argument type; numpy.array(u'123') and numpy.array(b'123') are different. In numpy's case, each of those produces a derived datatype that is explicitly pickled and (I believe) would prevent the error, but some other more heuristic library might not do so.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:24 | admin | set | github: 57775 |
2017-03-06 16:02:42 | josh.r | set | nosy:
+ josh.r messages:
+ msg289119
|
2017-03-06 07:41:33 | serhiy.storchaka | set | files:
+ pickle-old-strings-2.patch |
2017-02-19 19:28:26 | serhiy.storchaka | set | messages:
+ msg288158 versions:
+ Python 3.7, - Python 3.5 |
2015-06-09 10:08:12 | pitrou | set | messages:
+ msg245057 |
2015-06-09 10:07:45 | serhiy.storchaka | set | messages:
+ msg245056 |
2015-06-09 08:38:57 | pitrou | set | messages:
+ msg245051 |
2015-06-09 08:32:04 | serhiy.storchaka | set | messages:
+ msg245050 |
2015-06-09 08:23:06 | pitrou | set | messages:
+ msg245049 |
2015-06-09 08:05:52 | serhiy.storchaka | set | messages:
+ msg245048 |
2015-05-12 09:17:43 | serhiy.storchaka | set | files:
+ pickle_old_strings.patch
title: Array objects pickled in 3.x with protocol <=2 are unpickled incorrectly in 2.x -> Increase pickle compatibility keywords:
+ patch type: behavior -> enhancement versions:
+ Python 3.5, - Python 2.7, Python 3.3, Python 3.4 messages:
+ msg242949 stage: needs patch -> patch review |
2015-05-11 07:54:27 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2013-12-21 07:30:06 | serhiy.storchaka | set | nosy:
+ terry.reedy
|
2013-12-18 16:08:33 | vstinner | set | nosy:
+ vstinner messages:
+ msg206536
|
2013-12-18 16:02:15 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg206532
|
2013-12-18 09:51:31 | vajrasky | set | nosy:
+ vajrasky messages:
+ msg206504
|
2013-12-07 10:17:59 | alexandre.vassalotti | set | stage: needs patch messages:
+ msg205445 versions:
+ Python 2.7, Python 3.4, - Python 3.2 |
2011-12-09 15:27:03 | sbt | set | messages:
+ msg149104 |
2011-12-09 14:23:58 | Ramchandra Apte | set | nosy:
+ Ramchandra Apte messages:
+ msg149101
|
2011-12-09 13:26:30 | pitrou | set | nosy:
+ pitrou, alexandre.vassalotti
|
2011-12-09 13:25:03 | sbt | create | |