Author tanzer@swing.co.at
Recipients alexandre.vassalotti, belopolsky, eddygeek, pitrou, serhiy.storchaka, tanzer@swing.co.at, tim.peters
Date 2015-10-15.11:55:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1444910144.17.0.311597298326.issue22005@psf.upfronthosting.co.za>
In-reply-to
Content
IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted to datetime instances 
(for a detailed rambling see http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) .

In Python 2, 8-bit strings are used for text and for binary data. Well designed applications will use unicode for all text, but Python 2 itself forces some text to be 8-bit string, e.g., names of attributes, classes, and functions. In other words, **any 8-bit strings explicitly created by such an application will contain binary data.**

In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead.

In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like this:

* convert ASCII values to `str`

* convert non-ASCII values to `bytes`

`bytes` is Python 3's equivalent to Python 2's 8-bit string! 

It is only because of the use of 8-bit strings for Python 2 names that the mapping to `str` is necessary but all such names are guaranteed to be ASCII!

I would propose to change `load_binstring` and `load_short_binstring` to call a function like::

    def _decode_binstring(self, value):
        # Used to allow strings from Python 2 to be decoded either as
        # bytes or Unicode strings.  This should be used only with the
        # BINSTRING and SHORT_BINSTRING opcodes.
        if self.encoding != "bytes":
            try :
                return value.decode("ASCII")
            except UnicodeDecodeError:
                pass
        return value

instead of the currently called `_decode_string`.
History
Date User Action Args
2015-10-15 11:55:44tanzer@swing.co.atsetrecipients: + tanzer@swing.co.at, tim.peters, belopolsky, pitrou, alexandre.vassalotti, serhiy.storchaka, eddygeek
2015-10-15 11:55:44tanzer@swing.co.atsetmessageid: <1444910144.17.0.311597298326.issue22005@psf.upfronthosting.co.za>
2015-10-15 11:55:44tanzer@swing.co.atlinkissue22005 messages
2015-10-15 11:55:43tanzer@swing.co.atcreate