Author tanzer@swing.co.at
Recipients alexandre.vassalotti, belopolsky, eddygeek, pitrou, tanzer@swing.co.at, tim.peters
Date 2015-10-12.14:32:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1444660330.27.0.342577452812.issue22005@psf.upfronthosting.co.za>
In-reply-to
Content
> The code works when using encoding='bytes'. Thanks Tim for the suggestion.

> So this is not a bug, but is there any sense in having encoding='ASCII' by default in pickle ?

It is most definitely a bug. And it adds another road block to moving python applications from 2.7 to 3.x!

encoding='bytes' has serious side effects and isn't useful in the general case. For instance, it will result in dict-keys being unpickled as bytes instead of as str after which hilarity ensues.

I got the exception

  UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: ordinal not in range(128)

when testing an application for compatibility in Python 3.5 on a pickle created by Python 2.7. The pickled data is a nested data structure and it took me quite a while to determine that the single datetime instance was the culprit.

Here is a small test case that reproduces the problem::

# -*- coding: utf-8 -*-
# pickle_dump.py 
import datetime, pickle, uuid
dti = datetime.datetime(2015, 10, 12, 13, 17, 42, 123456)
data = { "ascii" : "abc", "text" : u"äbc", "int" :  42, "date-time" : dti }
with open("/tmp/pickle.test", "wb") as file :
    pickle.dump(data, file, protocol=2)

# pickle_load.py
# -*- coding: utf-8 -*-
import pickle
with open("/tmp/pickle.test", "rb") as file :
    data = pickle.load(file)
print(data)

$ python2.7 pickle_dump.py
$ python2.7 pickle_load.py 
{'ascii': 'abc', 'text': u'\xe4bc', 'int': 42, 'date-time': datetime.datetime(2015, 10, 12, 13, 17, 42, 123456)}
$ python3.5 pickle_load.py 
Traceback (most recent call last):
  File "pickle_load.py", line 6, in <module>
    data = pickle.load(file)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 1: ordinal not in range(128)

That error message is spectacularly useless.
History
Date User Action Args
2015-10-12 14:32:10tanzer@swing.co.atsetrecipients: + tanzer@swing.co.at, tim.peters, belopolsky, pitrou, alexandre.vassalotti, eddygeek
2015-10-12 14:32:10tanzer@swing.co.atsetmessageid: <1444660330.27.0.342577452812.issue22005@psf.upfronthosting.co.za>
2015-10-12 14:32:10tanzer@swing.co.atlinkissue22005 messages
2015-10-12 14:32:10tanzer@swing.co.atcreate