This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Persistent id in pickle with protocol version 0
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: alexandre.vassalotti Nosy List: alexandre.vassalotti, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-04-13 11:41 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_bad_persid.patch alexandre.vassalotti, 2013-04-14 04:01
fix_bad_persid_2.patch serhiy.storchaka, 2015-02-13 08:32 review
Messages (9)
msg186705 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-13 11:41
Python 2 allows pickling and unpickling non-ascii persistent ids. In Python 3 C implementation of pickle saves persistent ids with protocol version 0 as utf8-encoded strings and loads as bytes.

>>> import pickle, io
>>> class MyPickler(pickle.Pickler):
...     def persistent_id(self, obj):
...         if isinstance(obj, str):
...             return obj
...         return None
... 
>>> class MyUnpickler(pickle.Unpickler):
...     def persistent_load(self, pid):
...         return pid
... 
>>> f = io.BytesIO(); MyPickler(f).dump('\u20ac'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
'€'
>>> f = io.BytesIO(); MyPickler(f, 0).dump('\u20ac'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
b'\xe2\x82\xac'
>>> f = io.BytesIO(); MyPickler(f, 0).dump('a'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
b'a'

Python implementation in Python 3 doesn't works with non-ascii persistant ids at all.
msg186789 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-04-13 18:35
In protocol 0, the persistent ID is restricted to alphanumeric strings because of the problems that arise when the persistent ID contains newline characters. _pickle likely should be changed to use the ASCII decoded. And perhaps, we should check for embedded newline characters too.
msg186816 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-13 20:07
Even for alphanumeric strings Python 3 have a bug. It saves strings and load bytes objects.
msg186881 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-04-14 04:01
Here's a patch that fix the bug.
msg186894 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-14 08:33
I think a string with character codes < 256 will be better for test_protocol0_is_ascii_only(). It can be latin1 encoded (Python 2 allows any 8-bit strings).

PyUnicode_AsASCIIString() can be slower than _PyUnicode_AsStringAndSize() (actually PyUnicode_AsUTF8AndSize()) because the latter can use cached value. You can check if the persistent id only contains ASCII characters by checking PyUnicode_GET_LENGTH(pid_str) == size.

And what are you going to do with the fact that in Python 2 you can pickle non-ascii persistent ids, which will not be able to unpickle in Python 3?
msg235881 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-13 08:32
The patch is updated to current sources. Also optimized writing ASCII strings and fixed tests.
msg268851 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-19 12:03
Ping.
msg269874 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-06 09:31
Ping again.
msg270619 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-07-17 08:36
New changeset f6a41552a312 by Serhiy Storchaka in branch '3.5':
Issue #17711: Fixed unpickling by the persistent ID with protocol 0.
https://hg.python.org/cpython/rev/f6a41552a312

New changeset df8857c6f3eb by Serhiy Storchaka in branch 'default':
Issue #17711: Fixed unpickling by the persistent ID with protocol 0.
https://hg.python.org/cpython/rev/df8857c6f3eb
History
Date User Action Args
2022-04-11 14:57:44adminsetgithub: 61911
2016-10-25 18:41:54serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-07-17 08:36:13python-devsetnosy: + python-dev
messages: + msg270619
2016-07-07 10:46:23pitrousetnosy: - pitrou
2016-07-06 09:31:09serhiy.storchakasetmessages: + msg269874
versions: + Python 3.6, - Python 3.4
2016-06-19 12:03:07serhiy.storchakasetmessages: + msg268851
2015-02-13 08:32:15serhiy.storchakasetfiles: + fix_bad_persid_2.patch

messages: + msg235881
versions: + Python 3.5, - Python 3.3
2013-04-14 08:33:29serhiy.storchakasetmessages: + msg186894
2013-04-14 04:01:45alexandre.vassalottisetfiles: + fix_bad_persid.patch
messages: + msg186881

assignee: alexandre.vassalotti
keywords: + patch
stage: needs patch -> patch review
2013-04-13 20:07:20serhiy.storchakasetmessages: + msg186816
2013-04-13 18:35:18alexandre.vassalottisetmessages: + msg186789
2013-04-13 11:41:49serhiy.storchakacreate