New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.1 cannot unpickle 2.7-created pickle #49430
Comments
The pickle was generated by pybench. You can try to display it by running: ./python Tools/pybench/pybench.py --debug -s 27.bench It fails with the following traceback: Traceback (most recent call last):
File "Tools/pybench/pybench.py", line 954, in <module>
PyBenchCmdline()
File "/home/antoine/py3k/__svn__/Tools/pybench/CommandLine.py", line
349, in __init__
rc = self.main()
File "Tools/pybench/pybench.py", line 888, in main
bench = pickle.load(f)
File "/home/antoine/py3k/__svn__/Lib/pickle.py", line 1335, in load
return Unpickler(file, encoding=encoding, errors=errors).load()
_pickle.UnpicklingError: bad pickle data |
Here is the pickle file. |
If I disable _pickle, I get a more meaningful trace: File "Tools/pybench/pybench.py", line 954, in <module> Apparently, pybench/Unicode.py has never been converted to py3k. This is likely not the only problem: >>> from pybench import *
>>> import pickle
>>> pickle.load(open('27.bench'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load
encoding=encoding, errors=errors).load()
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 829, in load
assert isinstance(key, bytes_types)
AssertionError This looks like the infamous bytes vs string problem ... |
The bytes/string issu was a red herring: with pickle.load(open('27.bench', 'b')), I get the same stack trace as from command line pybench invocation. >>> pickle.load(open('27.bench', 'rb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load
encoding=encoding, errors=errors).load()
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 830, in load
dispatch[key[0]](self)
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1055, in load_inst
klass = self.find_class(module, name)
File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1115, in find_class
__import__(module, level=0)
File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/Unicode.py", line 17
s = unicode(u''.join(map(str,range(100))))
^
SyntaxError: invalid syntax |
Hmm. It looks like another pickle vs. _pickle issue. Attached patch is a result of 2to3 applied to Tools/pybench (and a minor manual fix for pickle import) with _pickle import disabled. With the patch applied, $ ./python Tools/pybench/pybench.py --debug -s 27.bench runs fine and produces results identical to those for a similar run from the trunk. Looks like it is time to fire up gdb. |
Please keep the Python3 version of pybench compatible to Python 2.6 and 2.7. |
I have found bpo-1634499 which deals with running pybench. I suggest that we reopen that issue and move discussion of bpo-5180.diff there. I really like the idea to keep single source for 2.x and 3.x pybench, but we need to add some machinery to run 2.x pybench under python3.x transparently. Marc-Andre, are you taking over the pybench or the _pickle issue? |
It looks like I was able to get to the root of the problem. I am attaching two files that demonstrate the issue: ==> pickle-bug.py <== ==> unpickle-bug.py <== $ python2 pickle-bug.py /tmp/bug.pkl
$ python3 unpickle-bug.py /tmp/bug.pkl
<__main__.Bug object at 0x1006b6f40>
Traceback (most recent call last):
File "unpickle-bug.py", line 7, in <module>
bug = pickle.Unpickler(open(sys.argv[1], 'rb')).load() # doesn't
_pickle.UnpicklingError: bad pickle data The problematic pickle is really small, so I hope I'll report that I have a fix soon.
15: p PUT 0 |
As promised, here is the fix (bpo-5180-pickle.diff) for the "unpickle-bug.py" issue. Unfortunately, it looks like more bugs remain. 27.bench is still not loadable. |
The remaining bug is a bit harder. If you try to run unpickle-bug-2.py on the same pickle, you get $ python3 unpickle-bug-2.py /tmp/bug.pkl
<__main__.Bug object at 0x1006a8f40>
Traceback (most recent call last):
File "unpickle-bug-2.py", line 9, in <module>
bug = pickle.Unpickler(open(sys.argv[1], 'rb')).load() # doesn't
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.Bug'>, ()) The problem is acknowledged in the source code of instantiate function in _pickle.c:
Clearly, this is wrong because pickle.py way succeeds where _pickle does not. PS: I did not mean to unassign MAL, but I think this is more of alexandre.vassalotti's issue. |
I am attaching a patch which makes python3 read 27.bench without errors. I think this should be applied while a complete solution for unpickling old style class instances from text mode (protocol = 0) pickles is found. |
Alexander Belopolsky wrote:
Please use a different strategy for "porting" the Unicode tests to Create a new test modules Bytes (which reuses the Strings Finally, let the two test module Strings and Unicode fail with |
Alexander Belopolsky wrote:
>
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
>
> The bytes/string issu was a red herring: with pickle.load(open('27.bench', 'b')), I get the same stack trace as from command line pybench invocation.
>
>
>>>> pickle.load(open('27.bench', 'rb'))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1321, in load
> encoding=encoding, errors=errors).load()
> File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 830, in load
> dispatch[key[0]](self)
> File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1055, in load_inst
> klass = self.find_class(module, name)
> File "/Users/sasha/Work/python-svn/py3k-commit/Lib/pickle.py", line 1115, in find_class
> __import__(module, level=0)
> File "/Users/sasha/Work/python-svn/py3k-commit/Tools/pybench/Unicode.py", line 17
> s = unicode(u''.join(map(str,range(100))))
> ^
> SyntaxError: invalid syntax Note that pybench is written in a way that it can handle ImportErrors In the above case, it may be a good idea to redirect pickle to The Unicode and Strings modules should not be loaded in Python3. |
I am attaching a patch which focuses on fixing _pickle behavior. I opened a separate bpo-9102 to deal with pybench specific problems. Marc-Andre, I am reassigning this issue to myself and assigning bpo-9102 to you. I hope you don't mind. My patch attempts to emulate 2.x PyInstance_NewRaw with a call to tp_alloc. I considered using tp_new, but although __new__ is not special for classic classes, there is nothing to prevent users to define __new__ in their classic classes or add __new__ when they port these classes to 3.x. (BTW, I wonder what 2to3 does if classic class has __new__ defined.) This is slightly unorthodox use of tp_alloc, so I will follow up with a question on python-dev. |
Thank you for the nice investigative work! I will try my best to review this patch by next week. |
This is certainly the wrong thing to do. You could at least try PyBaseObject_Type.tp_new (see object_new() in typeobject.c), but even this is not necessarily right (for example if the class is derived from an extension type defining its own tp_new). So, IMO, the right thing to do would be to choose the first base type that isn't a Python-defined class and use its tp_new: staticbase = subtype;
while (staticbase && (staticbase->tp_flags & Py_TPFLAGS_HEAPTYPE))
staticbase = staticbase->tp_base; (these two lines are from tp_new_wrapper() in typeobject.c) That way you choose the right constructor function, yet don't call the Python-defined __new__ function. That's the only reasonable way of doing it I can imagine. It also follows the following behaviour: >>> class C(int):
... def __new__(cls, *a):
... print "__new__", a
... return int.__new__(cls, *a)
...
>>> C(5)
__new__ (5,)
5
>>> s = pickle.dumps(C(5))
__new__ (5,)
>>> x = pickle.loads(s)
>>> x
5 As you can see, int.__new__ has been called on unpickling but not C.__new__ (no print). |
(and to follow on my example, I'd point that the unpickled instance is still an instance of the right class: >>> type(x)
<class '__main__.C'> ) |
On Mon, Jun 28, 2010 at 7:32 PM, Antoine Pitrou <report@bugs.python.org> wrote:
Well, this is not so obviously wrong as you make it sound. If you look closer at object_new(), you will see that in sane situations it reduces to type->tp_alloc(type, 0) call. Remember, this is called in very special circumstances: when unpickling old style class instances from a 2.x generated pickle. No arguments are available for tp_new call, so excess_args check does not apply. The other check is for an ABC, but the situation where an old-style class becomes an ABC in 2.x to 3.x transition is hardly sane. If such sanity check is necessary, I would rather check the flag explicitly in _pickle.c rather than having to generate dummy args and kwds to satisfy tp_new signature.
This looks like overengineering to me. I think 3.x should simply refuse to unpickle old style class instances into anything that is not subclassed in python directly from object. It would be helpful at this point if you could provide a test case where tp_alloc logic does not work. Regardless of what algorithm we ultimately choose, I would like to include such test in regrtest. PS: Note that in my patch I've eliminated one control path that was calling instantiate(..) erroneously, but I have not reviewed the code carefully enough to eliminate the possibility that instantiate(..) would be called to instantiate new-style class instances. PPS: What is you opinion on the pickle.py trick of monkey patching __class__ on an instance of an empty class? Do you think this can be fooled? |
Today, yes. But tomorrow it may entail additional operations.
I find it better to "generate dummy args and kwds to satisfy tp_new
Right, but it would lead to almost the same code... Since you have to
I'll try to find one.
I don't think so. |
Antoine, I think I have found a better solution. Since we are dealing with classic classes, they should not define __new__. If in porting to 3.x, users introduce __new__ that requires arguments, they will not be able to unpickle 2.x pickles no matter what we do. Therefore, I propose to replace _EmptyClass trick in pickle.py with a call to klass.__new__:
+ value = klass.__new__(klass) and do the same in _pickle.c's instantiate: r = PyObject_CallMethod(cls, "__new__", "O", cls); Note that I am not even calling tp_new here directly, so all the hoops in tp_new_wrapper get jumped through. The advantage of this scheme is that python and C code will work exactly the same and users that have corner cases for which the scheme does not work will be able to figure out what is going on by looking at the python code. Attaching issue5180a.diff |
Antoine said on IRC that he is ok with the latest approach. Does anyone want to review the patch before it goes in? |
I have fixed some style nits in your patch. It would be nice to have tests for the different control paths in instantiate(). But, I realize that would be a bit annoying. Apart from that, the patch looks good. |
Alexandre, I am not sure your change form PyObject_Size(args) to Py_SIZE(args) is correct. As far as I can tell, args come from pickle machine stack without any type checks. The equivalent code in 2.x cPickle uses PyObject_Size and checks for errors. I'll try to come up with a test case for the error situation, but it is a bit tricky because pickle should be produced in 2.x in order for this function to be used. |
The args argument is always a tuple created with Pdata_poptuple(). You can add an explicit type check. If this check fails a RuntimeError should be raised, because this would indicate a programming error in pickle. |
Committed issue5180b.diff with minor additions in r82937 (r82938 for 3.1):
The explanation of INST and OBJ opcodes in pickletools is now out of date, so I am adding bpo-9267 as a superseder. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: