-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickle is unable to encode unicode surrogates #52630
Comments
Python3 uses unicode surrogates to store undecodable filenames. Eg. the filename b"abc\xff.py" is encoded as "abc\xdcff.py" if the file system encoding is ASCII. Pickle is unable to store them: ./python -c 'import pickle; pickle.dumps("abc\udcff")' This is a limitation of pickle (in the binary mode): Python accepts to store any unicode character, but pickle doesn't. Using "surrogatepass" error handler should be enough to fix this issue. Related issue: bpo-3672 (Reject surrogates in utf-8 codec) -> r72208 creates "surrogatepass" error handler. |
I found this bug indirectly: test_logging failed on a SocketHandler if LogRecord.pathname contains a surrogate character. SocketHandler uses pickle to serialize the record. |
Both pickle and marshal will need to use the new error handler in order to stay compatible with Python 3.0 (and 2.x) and also to enable creating Unicode literals that include lone surrogates. |
Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue bpo-3672). |
STINNER Victor wrote:
Looks good ! Thanks. |
Commited: r80031 (py3k) and r80032 (3.1), fix also pickletools. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: