classification
Title: Make pickle generated by Python 3.x compatible with 2.x and vice-versa.
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, gvanrossum, hagen, jcea, mkiever, pitrou, rhettinger
Priority: critical Keywords: patch

Created on 2009-05-28 08:52 by mkiever, last changed 2011-02-21 23:39 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
compat_pickle.diff alexandre.vassalotti, 2009-05-31 00:09
compat_pickle2.diff pitrou, 2009-06-03 21:32
compat_pickle5.diff pitrou, 2009-06-03 22:49
compat_pickle6.diff alexandre.vassalotti, 2009-06-04 01:18
compat_pickle7.diff alexandre.vassalotti, 2009-06-04 20:32
Messages (24)
msg88470 - (view) Author: Matthias Kievernagel (mkiever) Date: 2009-05-28 08:52
Hello,

while porting something to Python 3.1a1
I found out that Python 3 cannot load most Python 2 pickles
of any protocol because copy_reg has been renamed to copyreg.

Found this comment by Skip Montanaro in related issue:
  http://bugs.python.org/issue3799#msg76196

Could not find an issue opened for this though.
So I'm opening one.

Regards,
Matthias Kievernagel
msg88487 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-28 22:58
#3675 is a similar issue, too bad nothing could be done to solve it...
msg88589 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-05-31 00:09
If I understood correctly, #3675 is about making pickle data generated
by Python 3 readable by Python 2. However, this issue is about
compatibility in the other direction—i.e., making Python 2 pickles
readable by Python 3, which is easier.

I have a patch that make Unpickler translates the old names to the new
ones at runtime. The only thing missing in the patch is the unit tests
since I am not sure this should be tested. I am thinking using the
approach Collin Winter used for his compatibility tests in #5665.
However, his approach requires bidirectional pickle compatibility
between Python 2 and 3, which we don't have.
msg88602 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-31 18:28
> If I understood correctly, #3675 is about making pickle data generated
> by Python 3 readable by Python 2.

Only if a protocol <= 2 is specified. Therefore it seems it's "only" a
matter of translating module names.
msg88640 - (view) Author: Matthias Kievernagel (mkiever) Date: 2009-06-01 12:36
Applied the patch
  http://bugs.python.org/file14124/compat_pickle.diff
to rev. 73106.
Patch applies fine, 'make test' passes
and it solves my problem.
(which is far from a complete test case though
 - only 5 small pickles)

Thanks,
Matthias Kievernagel
msg88684 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-01 23:11
Some comments on the patch:
- I don't understand why you create a static "twotuple" object rather
than simply using Py_BuildValue("(OO)", ...). Mutating tuples is bad.
- I don't think you need to call PyDict_Contains() before
PyDict_GetItem(). The latter will simply return NULL if the key isn't found.
- You should check whether the item fetched from the dictionary has the
right datatype (tuple for name_mapping, str for import_mapping). Anyone
can change (monkeypatch) _pickle_compat and make the interpreter crash.
msg88826 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-03 20:54
Unpickling e.g. StringIO objects doesn't seem to work:

>>> s = pickle.load(open("stringio.pickle", "rb"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/py3k/picklecompat-6137/Lib/pickle.py", line 1351,
in load
    encoding=encoding, errors=errors).load()
AttributeError: '_io.BytesIO' object has no attribute '__dict__'
>>>
msg88831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-03 21:32
An improved patch with tests.
It has no tests for fix_imports=False, though.
msg88836 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-03 22:32
A new patch which also includes reverse mappings, so that protocol <= 2
pickles created with 3.x can also work under 2.x.
(that is, it also solves #3675)
msg88838 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-03 22:39
Updated patch with a couple of documentation and function prototype fixes.
msg88841 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-03 22:49
Sorry, last patch had a couple of minor issues
msg88861 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-06-04 01:18
Here is an updated patch based on Antoine's latest patch.

Summary of changes:
 * Updated docstrings of Pickler and Unpickler in the pickle module.
 * Fixed pickle._Pickler to consider fix_imports only for protocol < 3
 * Made module name remapping in _pickle more robust:
    - added PyUnicode_Check on global_name and module_name;
    - used PyDict_GetItemWithError instead of PyDict_GetItem
 * Changed Py_BuildValue("(OO)", ...) to its faster equivalent
   PyTuple_Pack(2, ...).

I don't really the idea of remapping names generated by Pickler, since
it breaks the identity guarantee in save_global(). However, I agree this
is an example where practicality beats purity. So, I do not oppose to
the change.
msg88863 - (view) Author: Hagen Fürstenau (hagen) Date: 2009-06-04 06:09
The latest patch is missing the file "Lib/_compat.pickle.py". (Seems as
if you forgot to svn add it after patching.)
msg88901 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-06-04 20:32
Oops. Here is a new patch with _compat_pickle.py.
msg88902 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-04 20:33
Committed in r73236 in the hope that it gets a bit more testing before
rc2/final.
msg88921 - (view) Author: Hagen Fürstenau (hagen) Date: 2009-06-05 05:29
I think it is worth noting that now some Python 3.1 protocol 2 pickles
can't be read by Python 3.0. We probably don't have to do anything about
that, but perhaps it should be mentioned somewhere? Docs, release notes?
msg89004 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-06 14:22
I've added an entry in the what's new file in r73254.
msg89153 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-09 17:56
Guido, Raymond suggested we ask your input on this one, although it's
obviously a bit late (the patch has been committed).
msg89154 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-06-09 18:03
I'm not on IRC. What exactly is your question for me? What is Raymond's 
view?
msg89156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-06-09 18:16
I vaguely remember you rejecting a proposal along these lines when Brett
was doing the library renaming.  

The patch (as applied) turns on the renaming automatically when used
with protocol 2 (i.e. all object names are stored with their 2.x names,
not their new names).   Hagen points out that 3.1 proto 2 pickles can't
be read by 3.0.  I would think that the back-translation should be
disabled by default or else we're going to live with the 2.x names for a
very long time.

I don't care much one way or the other.  Just thought you should see it
before it got released and set in stone.  It's not too late for a one
line change, setting the fix_imports default from True to False.

FWIW, the argument for leaving the default as True is the theory that
anyone using protocol 2 is likely doing so to exchange data with 2.x.
msg89157 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-06-09 19:02
Ah. How about only doing back-translation when protocol=2 (or lower)
is explicitly selected?

I don't much like that 3.0 will be to read pickles written by 3.1 with
the default protocol (i.e. 3), but I don't mind breaking protocol 2,
since that's most likely (as you say) intended for Python 2. So the
default for fix_imports would be None, and the __init__ would check if
it's None, and then set it to (protocol <= 2).
msg89158 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-09 19:16
> Ah. How about only doing back-translation when protocol=2 (or lower)
> is explicitly selected?

Well, this is exactly what is implemented!

> I don't much like that 3.0 will be to read pickles written by 3.1 with
> the default protocol (i.e. 3),

I suppose you meant to say "will be unable", but it isn't, since
protocol 3 pickles are not affected at all by this patch.
msg89159 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-06-09 19:26
On Tue, Jun 9, 2009 at 12:16 PM, Antoine Pitrou<report@bugs.python.org> wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> Ah. How about only doing back-translation when protocol=2 (or lower)
>> is explicitly selected?
>
> Well, this is exactly what is implemented!

Ah, I missed that. Fine then!

>> I don't much like that 3.0 will be to read pickles written by 3.1 with
>> the default protocol (i.e. 3),
>
> I suppose you meant to say "will be unable",

Indeed.

> but it isn't, since
> protocol 3 pickles are not affected at all by this patch.

Good.

So I'm okay with the status quo -- too bad 3.0 can't read those
pickles, but that's the deal with abandoning 3.0...
msg89160 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-06-09 19:32
Thanks for taking a look and opining.
History
Date User Action Args
2011-02-21 23:39:35jceasetnosy: + jcea
2009-06-09 19:32:46rhettingersetmessages: + msg89160
2009-06-09 19:26:39gvanrossumsetmessages: + msg89159
2009-06-09 19:16:27pitrousetmessages: + msg89158
2009-06-09 19:02:59gvanrossumsetmessages: + msg89157
2009-06-09 18:16:40rhettingersetnosy: + rhettinger
messages: + msg89156
2009-06-09 18:03:19gvanrossumsetmessages: + msg89154
2009-06-09 17:56:54pitrousetnosy: + gvanrossum
messages: + msg89153
2009-06-06 14:22:12pitrousetstatus: open -> closed

messages: + msg89004
2009-06-05 05:30:20hagensetstatus: pending -> open

messages: + msg88921
2009-06-04 20:33:01pitrousetstatus: open -> pending
messages: + msg88902

assignee: alexandre.vassalotti ->
resolution: fixed
stage: patch review -> resolved
2009-06-04 20:32:21alexandre.vassalottisetfiles: + compat_pickle7.diff

messages: + msg88901
2009-06-04 06:10:16hagensetnosy: + hagen
messages: + msg88863
2009-06-04 01:21:36alexandre.vassalottilinkissue3675 superseder
2009-06-04 01:18:57alexandre.vassalottisetfiles: + compat_pickle6.diff
assignee: alexandre.vassalotti
messages: + msg88861

title: Pickle migration: Should pickle map "copy_reg" to "copyreg"? -> Make pickle generated by Python 3.x compatible with 2.x and vice-versa.
2009-06-03 22:50:18pitrousetfiles: - compat_pickle4.diff
2009-06-03 22:50:14pitrousetfiles: - compat_pickle3.diff
2009-06-03 22:49:47pitrousetfiles: + compat_pickle5.diff

messages: + msg88841
2009-06-03 22:39:37pitrousetfiles: + compat_pickle4.diff

dependencies: - Python 2.6 can't read sets pickled with Python 3.0
messages: + msg88838
2009-06-03 22:32:48pitrousetfiles: + compat_pickle3.diff

dependencies: + Python 2.6 can't read sets pickled with Python 3.0
messages: + msg88836
2009-06-03 21:32:38pitrousetfiles: + compat_pickle2.diff

messages: + msg88831
2009-06-03 20:54:42pitrousetmessages: + msg88826
2009-06-01 23:13:14pitrousetpriority: critical
nosy: pitrou, mkiever, alexandre.vassalotti
type: behavior
components: + Library (Lib), - None
2009-06-01 23:11:52pitrousetmessages: + msg88684
2009-06-01 12:36:11mkieversetmessages: + msg88640
2009-05-31 18:28:31pitrousetmessages: + msg88602
2009-05-31 00:09:43alexandre.vassalottisetfiles: + compat_pickle.diff
keywords: + patch
messages: + msg88589

stage: patch review
2009-05-28 22:58:29pitrousetnosy: + alexandre.vassalotti, pitrou
messages: + msg88487
2009-05-28 08:52:35mkievercreate