Message253042
IMNSHO, the problem lies in the Python 3 pickle.py and it is **not** restricted to datetime instances
(for a detailed rambling see http://c-tanzer.at/en/breaking_py2_pickles_in_py3.html) .
In Python 2, 8-bit strings are used for text and for binary data. Well designed applications will use unicode for all text, but Python 2 itself forces some text to be 8-bit string, e.g., names of attributes, classes, and functions. In other words, **any 8-bit strings explicitly created by such an application will contain binary data.**
In Python 2, pickle.dump uses BINSTRING (and SHORT_BINSTRING) for 8-bit strings; Python 3 uses BINBYTES (and SHORT_BINBYTES) instead.
In Python 3, pickle.load should handle BINSTRING (and SHORT_BINSTRING) like this:
* convert ASCII values to `str`
* convert non-ASCII values to `bytes`
`bytes` is Python 3's equivalent to Python 2's 8-bit string!
It is only because of the use of 8-bit strings for Python 2 names that the mapping to `str` is necessary but all such names are guaranteed to be ASCII!
I would propose to change `load_binstring` and `load_short_binstring` to call a function like::
def _decode_binstring(self, value):
# Used to allow strings from Python 2 to be decoded either as
# bytes or Unicode strings. This should be used only with the
# BINSTRING and SHORT_BINSTRING opcodes.
if self.encoding != "bytes":
try :
return value.decode("ASCII")
except UnicodeDecodeError:
pass
return value
instead of the currently called `_decode_string`. |
|
Date |
User |
Action |
Args |
2015-10-15 11:55:44 | tanzer@swing.co.at | set | recipients:
+ tanzer@swing.co.at, tim.peters, belopolsky, pitrou, alexandre.vassalotti, serhiy.storchaka, eddygeek |
2015-10-15 11:55:44 | tanzer@swing.co.at | set | messageid: <1444910144.17.0.311597298326.issue22005@psf.upfronthosting.co.za> |
2015-10-15 11:55:44 | tanzer@swing.co.at | link | issue22005 messages |
2015-10-15 11:55:43 | tanzer@swing.co.at | create | |
|