Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle is unable to encode unicode surrogates #52630

Closed
vstinner opened this issue Apr 13, 2010 · 6 comments
Closed

pickle is unable to encode unicode surrogates #52630

vstinner opened this issue Apr 13, 2010 · 6 comments
Labels
stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 8383
Nosy @malemburg, @loewis, @vstinner
Files
  • pickle_surrogates.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-04-13.11:10:22.602>
    created_at = <Date 2010-04-13.00:39:55.090>
    labels = ['library']
    title = 'pickle is unable to encode unicode surrogates'
    updated_at = <Date 2010-04-13.11:10:22.601>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2010-04-13.11:10:22.601>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-04-13.11:10:22.602>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2010-04-13.00:39:55.090>
    creator = 'vstinner'
    dependencies = []
    files = ['16904']
    hgrepos = []
    issue_num = 8383
    keywords = ['patch']
    message_count = 6.0
    messages = ['102996', '102997', '103022', '103029', '103030', '103034']
    nosy_count = 3.0
    nosy_names = ['lemburg', 'loewis', 'vstinner']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue8383'
    versions = ['Python 3.1', 'Python 3.2']

    @vstinner
    Copy link
    Member Author

    Python3 uses unicode surrogates to store undecodable filenames. Eg. the filename b"abc\xff.py" is encoded as "abc\xdcff.py" if the file system encoding is ASCII. Pickle is unable to store them:

    ./python -c 'import pickle; pickle.dumps("abc\udcff")'
    (...)
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 20: surrogates not allowed

    This is a limitation of pickle (in the binary mode): Python accepts to store any unicode character, but pickle doesn't.

    Using "surrogatepass" error handler should be enough to fix this issue.

    Related issue: bpo-3672 (Reject surrogates in utf-8 codec) -> r72208 creates "surrogatepass" error handler.

    @vstinner vstinner added the stdlib Python modules in the Lib dir label Apr 13, 2010
    @vstinner
    Copy link
    Member Author

    I found this bug indirectly: test_logging failed on a SocketHandler if LogRecord.pathname contains a surrogate character. SocketHandler uses pickle to serialize the record.

    @malemburg
    Copy link
    Member

    Both pickle and marshal will need to use the new error handler in order to stay compatible with Python 3.0 (and 2.x) and also to enable creating Unicode literals that include lone surrogates.

    @vstinner
    Copy link
    Member Author

    Both pickle and marshal will need to use the new error handler
    in order to stay compatible with Python 3.0 (and 2.x)
    and also to enable creating Unicode literals that include
    lone surrogates.

    Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue bpo-3672).

    @malemburg
    Copy link
    Member

    STINNER Victor wrote:

    STINNER Victor <victor.stinner@haypocalc.com> added the comment:

    > Both pickle and marshal will need to use the new error handler
    > in order to stay compatible with Python 3.0 (and 2.x)
    > and also to enable creating Unicode literals that include
    > lone surrogates.

    Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue bpo-3672).

    Looks good !

    Thanks.

    @vstinner
    Copy link
    Member Author

    Commited: r80031 (py3k) and r80032 (3.1), fix also pickletools.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants