Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portability issues with pickle #76010

Closed
serhiy-storchaka opened this issue Oct 20, 2017 · 11 comments
Closed

Portability issues with pickle #76010

serhiy-storchaka opened this issue Oct 20, 2017 · 11 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 31829
Nosy @pitrou, @avassalotti, @benjaminp, @serhiy-storchaka, @csabella, @miss-islington
PRs
  • [2.7] bpo-31829: Resolve some portability issues with pickle. #4067
  • bpo-31829: Make protocol 0 pickles be loadable in text mode in Python 2. #11859
  • [3.7] bpo-31829: Make protocol 0 pickles be loadable in text mode in Python 2. (GH-11859) #13693
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2020-01-10.12:11:17.614>
    created_at = <Date 2017-10-20.17:41:29.327>
    labels = ['3.7', '3.8', 'type-feature', 'library']
    title = 'Portability issues with pickle'
    updated_at = <Date 2020-01-10.12:11:17.613>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2020-01-10.12:11:17.613>
    actor = 'cheryl.sabella'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2020-01-10.12:11:17.614>
    closer = 'cheryl.sabella'
    components = ['Library (Lib)']
    creation = <Date 2017-10-20.17:41:29.327>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31829
    keywords = ['patch']
    message_count = 11.0
    messages = ['304667', '304700', '306091', '306104', '335586', '335588', '335834', '335835', '343890', '344035', '345324']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'alexandre.vassalotti', 'benjamin.peterson', 'serhiy.storchaka', 'cheryl.sabella', 'miss-islington']
    pr_nums = ['4067', '11859', '13693']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue31829'
    versions = ['Python 2.7', 'Python 3.7', 'Python 3.8']

    @serhiy-storchaka
    Copy link
    Member Author

    After reading numerous pickle-related issues on GitHab, I have found that the most common issue with pickle in Python 2 is using it with files opened with text mode.

        with open(file_name, "w") as f:
            pickle.dump(data, f)

    Initially pickle was a text protocol. But since implementing more efficient binary opcodes it is a binary protocol. Even the default protocol 0 is not completely text-safe. If save and load data containing Unicode strings with "text" protocol 0 using different text/binary modes or using text mode on different platforms, you can get an error or incorrect data.

    I propose to add more defensive checks for pickle.

    1. When save a pickle with protocol 0 (default) to a file opened in text mode (default) emit a Py3k warning.

    2. When save a pickle with binary protocols (must be specified explicitly) to a file opened in text mode raise a ValueError on Windows and Mac Classic (resulting data is likely corrupted) and emit a warning on Unix and Linux. What the type of of warnings is more appropriate? DeprecationWarning, DeprecationWarning in py3k mode, RuntimeWarning, or UnicodeWarning?

    3. Escape \r and \x1a (end-of-file in MS DOS) when pickle Unicode strings with protocol 0.

    4. Detect the most common errors (e.g. module name ending with \r when load on Linux a pickle saved with text mode on Windows) and raise more informative error message.

    5. Emit a warning when load an Unicode string ending with \r. This is likely an error (if the pickle was saved with text mode on Windows), but this can a correct data if the saved Unicode string actually did end with \r. This is the most dubious proposition. On one hand, it is better to warn than silently return an incorrect result. On other hand, the correct result shouldn't provoke a warning.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 20, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    PR 4067 fixes following issues when unpickle on Unix or in binary mode files written with protocol 0 in text mode on Windows:

    • ints were unpickled as longs by cPickle.
    • bools were unpickled as longs by cPickle and as ints by pickle.
    • floats couldn't be unpickled by cPickle.
    • strings couldn't be unpickled by pickle.
    • instances and globals couldn't be unpickled. And error messages were confusing due to invisible \r.
    • pickles with protocol 0 containing Unicode string with \x1a couldn't be unpickled on Windows in text mode.

    @serhiy-storchaka
    Copy link
    Member Author

    It is possible to resolve issue with Unicode strings ending with \r. We can add a special mark in the stream (a combination of opcodes which is no-op) before writing the first Unicode strings ending with \r. If this mark is encountered in an input stream, therefore it was saved with new Python version, and ending \r can be removed from loaded Unicode strings.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated PR correctly loads Unicode strings saved in text mode. As a mark used some corrected opcodes followed by newline. If any of previous newlines is \r\n, thus the file was written in text mode and is read in binary mode. If no opcodes with newlines was saved before the UNICODE opcode, the special no-op sequence STRING + "''\n" + POP is saved. This minimize overhead in common case.

    I'm going to merge this PR and port some changes to Python 3. Could anybody please make a review of the documentation changes?

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 12, 2017
    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 12, 2017
    @benjaminp
    Copy link
    Contributor

    The proposed PR looks big. Are these actual bug fixes or features? "Portability improvements" sounds like a feature.

    @serhiy-storchaka
    Copy link
    Member Author

    I have simplified the PR. Removed the complex code for detecting pickles written to files in text mode on Windows and for adding optional marks for correct detecting. Currently it does only two things:

    • Escapes \r, \0 and \x1a (end-of-file on Windows) when pickle unicode. This allows pickles dumped to a file in binary mode (or on non-Windows, or in Python 3) be correctly loaded from a file opened from a file in text mode on Windows.

    Currently, dumping to a file in text mode works most time, except on Windows, when the unicode string ends with \r or contains \x1a (not sure about \0, it was added just for the case). Since the data is only corrupted in special cases, this likely is not tested, and the user code can open files in text (default) mode without noticing a bug, until once a malicious user will provide a bad Unicode string.

    • Produces a deprecation warning or even a ValueError in cases when writing to a file in text mode can cause problems. This will help to notice the problem earlier.

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Feb 15, 2019
    @pitrou
    Copy link
    Member

    pitrou commented Feb 18, 2019

    I am not particularly interested in making Python 2 or ancient pickle protocols easier to use, sorry ;-)

    @serhiy-storchaka
    Copy link
    Member Author

    This can help with migrating to Python 3.

    Python 2 programs often open files in text (default) mode for pickling and unpickling. With these changes you will get a warning when run the interpreter with the -3 option. You can also make the producer opening a file in binary mode for compatibility with Python 3, and be sure that the Python 2 consumer will read it correctly even from a file opened in text mode on Windows.

    @csabella
    Copy link
    Contributor

    Bumping this discussion in case the should be merged for 3.8b1. Thanks!

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 38ab7d4 by Serhiy Storchaka in branch 'master':
    bpo-31829: Make protocol 0 pickles be loadable in text mode in Python 2. (GH-11859)
    38ab7d4

    @miss-islington
    Copy link
    Contributor

    New changeset d561f84 by Miss Islington (bot) in branch '3.7':
    bpo-31829: Make protocol 0 pickles be loadable in text mode in Python 2. (GH-11859)
    d561f84

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants