This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients alexandre.vassalotti, benjamin.peterson, pitrou, serhiy.storchaka
Date 2017-10-20.17:41:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1508521289.37.0.213398074469.issue31829@psf.upfronthosting.co.za>
In-reply-to
Content
After reading numerous pickle-related issues on GitHab, I have found that the most common issue with pickle in Python 2 is using it with files opened with text mode.

    with open(file_name, "w") as f:
        pickle.dump(data, f)

Initially pickle was a text protocol. But since implementing more efficient binary opcodes it is a binary protocol. Even the default protocol 0 is not completely text-safe. If save and load data containing Unicode strings with "text" protocol 0 using different text/binary modes or using text mode on different platforms, you can get an error or incorrect data.

I propose to add more defensive checks for pickle.

1. When save a pickle with protocol 0 (default) to a file opened in text mode (default) emit a Py3k warning.

2. When save a pickle with binary protocols (must be specified explicitly) to a file opened in text mode raise a ValueError on Windows and Mac Classic (resulting data is likely corrupted) and emit a warning on Unix and Linux. What the type of of warnings is more appropriate? DeprecationWarning, DeprecationWarning in py3k mode, RuntimeWarning, or UnicodeWarning?

3. Escape \r and \x1a (end-of-file in MS DOS) when pickle Unicode strings with protocol 0.

4. Detect the most common errors (e.g. module name ending with \r when load on Linux a pickle saved with text mode on Windows) and raise more informative error message.

5. Emit a warning when load an Unicode string ending with \r. This is likely an error (if the pickle was saved with text mode on Windows), but  this can a correct data if the saved Unicode string actually did end with \r. This is the most dubious proposition. On one hand, it is better to warn than silently return an incorrect result. On other hand, the correct result shouldn't provoke a warning.
History
Date User Action Args
2017-10-20 17:41:29serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, alexandre.vassalotti, benjamin.peterson
2017-10-20 17:41:29serhiy.storchakasetmessageid: <1508521289.37.0.213398074469.issue31829@psf.upfronthosting.co.za>
2017-10-20 17:41:29serhiy.storchakalinkissue31829 messages
2017-10-20 17:41:29serhiy.storchakacreate