Title: Python 3.2 fails to load protocol 0 pickle
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, axwalk, pitrou, python-dev, vinay.sajip
Priority: normal Keywords: patch

Created on 2011-08-03 09:55 by vinay.sajip, last changed 2011-08-11 19:21 by pitrou. This issue is now closed.

File name Uploaded Description Edit
test.bin vinay.sajip, 2011-08-03 09:55 Test pickle data which can't be loaded by Python 3.2
pickle-0-reading.diff vinay.sajip, 2011-08-10 11:13 Fixes problem with reading protocol 0 pickles. review
add-error-check.diff vinay.sajip, 2011-08-10 16:11 Added some error checking. review
Repositories containing patches
Messages (9)
msg141602 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-03 09:55
The attached 2.x-written protocol 0 pickle file cannot be loaded by Python 3.2 or 3.3, though it loads successfully in 2.x.

Code used to load:

data = pickle.load(open('test.bin', 'rb'))


Traceback (most recent call last):
  File "", line 4, in <module>
    data = pickle.load(open(sys.argv[1], 'rb'))
ValueError: invalid literal for int() with base 10: "273\n(g8\nS'uint64_t'\np274\ntp275\nsS'Module'\np276\n(g45\n(g39\nS'objc_module'\np277\nNtp278\ntp279\nsS'mach_msg_trailer_size_t'\np280\n(g4\ng190\ntp281\nsS'uint_fast16_t'\np282\n(g8\nS'uint16_t'\np283\ntp284\nsS'pthread_m"

The failure occurs on Ubuntu Natty. This does not appear to be the same issue as #6137.

AFAIK the data contains no classes: just dictionaries, tuples, lists, strings and numbers.
msg141603 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-03 10:01
I also noticed that in the file there are numerous previous integer values in a similar context, which were parsed without error. For example, if you look at the test.bin file in an editor, the failure occurs while parsing line 515. Notice the similar constructs at lines 510, 507, 504 etc.
msg141704 - (view) Author: Andrew Wilkins (axwalk) Date: 2011-08-06 03:26
In _pickle.c, the load_put function calls _Unpickler_Readline, which may prefetch data and place it after the line read in with "readline". load_put then calls PyLong_FromString, which doesn't like the trailing data after the '\n'.

Maybe just use PyOS_strtol instead? Alternatively, replace the newline with a null byte.
msg141884 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-10 20:32
First, the patch calls PyOS_strtol while a Py_ssize_t should be decodable. However, the dump phase (in memo_put) coerces the memo size to long as well, so this shouldn't be a problem in real life.

Second, the patch needs a test.

Also, please click on the "resolution" link for meaning of the various possible "values". "Accepted" is only to be used when something has been positively reviewed.
msg141885 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-10 23:16
I can add a test, using the data attached to the ticket, but like the marshal case we discussed before, it might be several KB of data, which I would incorporate into the tests using a similar approach to the one I used for marshal. (This data has been shrunk from a much larger data set, but I can't easily make it any smaller.)

I've no idea why I changed the resolution, I don't normally do this. Probably a case of brain-fade :-(
msg141895 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-11 06:59
Ok, the patch is not correct. The core issue is that _Unpickler_Readline should always return a \0-terminated string, but sometimes it doesn't; this issue should be fixed instead of working around it in some other function.
msg141898 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-11 09:40
I confess I'm not familiar enough with the pickle module internals to be sure of putting in the right fix quickly. I will take a look at _Unpickler_Readline when I get a chance, if someone doesn't beat me to it :-)
msg141919 - (view) Author: Roundup Robot (python-dev) Date: 2011-08-11 19:17
New changeset c47bc1349e61 by Antoine Pitrou in branch '3.2':
Issue #12687: Fix a possible buffering bug when unpickling text mode (protocol 0, mostly) pickles.

New changeset 6aa822071f4e by Antoine Pitrou in branch 'default':
Issue #12687: Fix a possible buffering bug when unpickling text mode (protocol 0, mostly) pickles.
msg141921 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-11 19:21
Fixed with a test.
Date User Action Args
2011-08-11 19:21:32pitrousetstatus: open -> closed
resolution: fixed
messages: + msg141921

stage: patch review -> resolved
2011-08-11 19:17:55python-devsetnosy: + python-dev
messages: + msg141919
2011-08-11 09:40:23vinay.sajipsetmessages: + msg141898
2011-08-11 06:59:57pitrousetmessages: + msg141895
2011-08-10 23:16:19vinay.sajipsetmessages: + msg141885
2011-08-10 20:32:43pitrousetresolution: accepted -> (no value)
messages: + msg141884
2011-08-10 16:11:34vinay.sajipsetfiles: + add-error-check.diff
2011-08-10 11:13:46vinay.sajipsetfiles: + pickle-0-reading.diff
keywords: + patch
2011-08-10 11:13:13vinay.sajipsethgrepos: + hgrepo58
resolution: accepted
stage: patch review
2011-08-06 03:26:03axwalksetnosy: + axwalk
messages: + msg141704
2011-08-03 10:01:18vinay.sajipsetmessages: + msg141603
2011-08-03 09:55:15vinay.sajipcreate