classification
Title: Increase pickle compatibility
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Ramchandra Apte, alexandre.vassalotti, josh.r, pitrou, sbt, serhiy.storchaka, terry.reedy, vajrasky, vstinner
Priority: normal Keywords: patch

Created on 2011-12-09 13:25 by sbt, last changed 2017-03-06 16:02 by josh.r.

Files
File name Uploaded Description Edit
pickle_old_strings.patch serhiy.storchaka, 2015-05-12 09:17 review
pickle-old-strings-2.patch serhiy.storchaka, 2017-03-06 07:41 review
Messages (16)
msg149092 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-12-09 13:25
If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string.  This makes python 2 reject the pickle.

#########################################

Python 3.3.0a0 (default, Dec  8 2011, 17:56:13) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, array
>>> pickle.dumps(array.array('i', [1,2,3]), 2)
b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.'

#########################################

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "c:\Python27\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "c:\Python27\lib\pickle.py", line 1133, in load_reduce
    value = func(*args)
TypeError: must be char, not unicode
msg149101 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2011-12-09 14:23
The problem is that pickle is calling array.array(u'i',[1,2,3]) and array.array in Python 2 doesn't allow unicode strings as a typecode (typecode is the first argument)

The docs in Python 2 and Py3k doesn't specify the type of the typecode argument of array.array.
In Python 2 it seems that typecode has to be a bytes string.
In Python 3 it seems that typecode has to be a unicode string.

I suggest that array.array be changed in Python 2 to allow unicode strings as a typecode or that pickle detects array.array being called and fixes the call.
msg149104 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-12-09 15:27
> I suggest that array.array be changed in Python 2 to allow unicode strings 
> as a typecode or that pickle detects array.array being called and fixes 
> the call.

Interestingly, py3 does understand arrays pickled by py2.  This appears to be because py2 pickles str using BINSTRING or SHORT_BINSTRING which will unpickle as str on py2 and py3.  py3 pickles str using BINUNICODE which will unpickle as unicode on py2 and str on py3.

I think it would be better to fix this in py3 if possible, but that does not look easy: modifying array.__reduce_ex__ alone would not be enough.

The only thing I can think of is for py3 to grow a "_binstr" type which only supports ascii strings and is special-cased by pickle to be pickled using BINSTRING.  Then array.__reduce_ex__ could be something like:

  def __reduce_ex__(self, protocol):
    if protocol <= 2:
      return array.array, (_binstr(self.typecode), list(self))
    else:
      ...
msg205445 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-12-07 10:17
Adding a special type is not a bad idea. We have to keep the code for loading BINSTRING opcodes anyway, so we might as well use it. It could be helpful for unit-testing our Python 2 compatibility support for pickle.

We should still fix array in 2.7 to accept unicode object for the typecode though.
msg206504 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-18 09:51
Alexandre Vassalotti said: "We should still fix array in 2.7 to accept unicode object for the typecode though."

I created issue #20014 (with the patch) for this feature.
msg206532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 16:02
See issue20015 for more general approach.
msg206536 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-18 16:08
> If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string.  This makes python 2 reject the pickle.

Pickles files of Python 3 are supposed to be compatible with Python 2?

It looks very tricky to produce pickle files compatible with both versions.
msg242949 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-12 09:17
Proposed patch pickles all ascii strings with protocols < 3 and fix_import=True with compatible opcodes (STRING, BINSTRING and SHORT_BINSTRING). Pickled strings are unpickled as str in Python 2 and Python 3 (unless encoding="bytes").

As a side effect, short ascii strings (length < 256) are pickled more compact with protocols < 3.
msg245048 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-09 08:05
Alexandre, Antoine, what are your thoughts?
msg245049 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-06-09 08:23
Won't that fail if a Python 2 API accepts only unicode strings?
msg245050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-09 08:32
Does such API even exist?
msg245051 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-06-09 08:38
I wouldn't be very surprised if third-party libraries enforce such typing, yes. If your library has a clear text/bytes separation, it makes sense to enforce it at the API level, to avoid mistakes by users.
msg245056 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-09 10:07
Such libraries already have a problem. Both str and unicode pickled in Python 2 are unpickled as str in Python 3.
msg245057 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-06-09 10:08
It's not a problem, since str *is* unicode in Python 3.
msg288158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-19 19:28
This is a problem when pickle data in Python 3 for unpickling in Python 2.
msg289119 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017-03-06 16:02
Right, but Antoine's objection is that suddenly strs pickled in Py3 can end up as strs in Py2, rather than unicode. If the library enforces a Py3-like type separation on Py2 (text arguments are unicode only, binary data is str only), then you have the problem where pickling on Py3 produces a pickle that will unpickle as str on Py2, and suddenly the library explodes because the argument, that should be unicode on Py2 and str on Py3, is suddenly str on both.

This means that, to fix a problem with non-forward compatible libraries (that accept text only as Py2 str), a Py2 library that's (very) forward thinking would have problems.

Admittedly, I wouldn't expect there to be very many such libraries, and many of them would have their own custom pickle formats, but stuff like numpy is quite sensitive to argument type; numpy.array(u'123') and numpy.array(b'123') are different. In numpy's case, each of those produces a derived datatype that is explicitly pickled and (I believe) would prevent the error, but some other more heuristic library might not do so.
History
Date User Action Args
2017-03-06 16:02:42josh.rsetnosy: + josh.r
messages: + msg289119
2017-03-06 07:41:33serhiy.storchakasetfiles: + pickle-old-strings-2.patch
2017-02-19 19:28:26serhiy.storchakasetmessages: + msg288158
versions: + Python 3.7, - Python 3.5
2015-06-09 10:08:12pitrousetmessages: + msg245057
2015-06-09 10:07:45serhiy.storchakasetmessages: + msg245056
2015-06-09 08:38:57pitrousetmessages: + msg245051
2015-06-09 08:32:04serhiy.storchakasetmessages: + msg245050
2015-06-09 08:23:06pitrousetmessages: + msg245049
2015-06-09 08:05:52serhiy.storchakasetmessages: + msg245048
2015-05-12 09:17:43serhiy.storchakasetfiles: + pickle_old_strings.patch

title: Array objects pickled in 3.x with protocol <=2 are unpickled incorrectly in 2.x -> Increase pickle compatibility
keywords: + patch
type: behavior -> enhancement
versions: + Python 3.5, - Python 2.7, Python 3.3, Python 3.4
messages: + msg242949
stage: needs patch -> patch review
2015-05-11 07:54:27serhiy.storchakasetassignee: serhiy.storchaka
2013-12-21 07:30:06serhiy.storchakasetnosy: + terry.reedy
2013-12-18 16:08:33vstinnersetnosy: + vstinner
messages: + msg206536
2013-12-18 16:02:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg206532
2013-12-18 09:51:31vajraskysetnosy: + vajrasky
messages: + msg206504
2013-12-07 10:17:59alexandre.vassalottisetstage: needs patch
messages: + msg205445
versions: + Python 2.7, Python 3.4, - Python 3.2
2011-12-09 15:27:03sbtsetmessages: + msg149104
2011-12-09 14:23:58Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg149101
2011-12-09 13:26:30pitrousetnosy: + pitrou, alexandre.vassalotti
2011-12-09 13:25:03sbtcreate