Title: Decide what to do with bytes/str when transferring pickles between 2.6 and 3.0
Components: Library (Lib) Versions: Python 3.0
Assigned To: gvanrossum Nosy List: alexandre.vassalotti, gvanrossum
Author: Guido van Rossum (gvanrossum) Date: 2008-03-16 21:16
A pickled str instance written by 2.6 currently unpickles under 3.0 as a
bytes instance. That would be correct if the intended use is binary
data, but it's wrong if the intended use is text. My hunch is that
there's more pickled text than binary data. (E.g. a dict containing
instance variables uses (8-bit) str instances for the keys; these *must*
be unpacked as (Unicode) str instances in 3.0.)

The inverse issue also exists.

We need to DECIDE this before starting to code (coding is probably
minimal).  I'm assigning the task to DECIDE (after discussion on the
list) to myself.
Author: Guido van Rossum (gvanrossum) Date: 2008-03-17 17:15
We have a proposed solution for 2.x -> 3.x.  In 3.x, an (8-bit) str
instance received from 2.x will be decoded into a (Unicode) str
instance.  The encoding defaults to ASCII; you can specify a different
encoding and also an error value on the load() or loads() call.

This of course doesn't solve all problems; str instances representing
binary data will be unpickled as strings.  The app will have to deal
with this.

By default 3.x will *write* pickles using a new version number which is
incompatible with 2.x.  I'm not sure yet if we should allow writing
pickles in 3.x that can be read in 2.x; we need use cases for that.
Author: Guido van Rossum (gvanrossum) Date: 2008-03-17 22:58
Checked in as r61467.

When pickling a bytes instance in a protocol < 3, it is pickled as a
user-defined type (essentially faking a __reduce__ operation) which can
be read back correctly in 3.0 but probably not in 2.x.
