New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickle should support methods #53522
Comments
pickle doesn't support methods: >>> class x:
... def y(self):
... pass
...
>>> import pickle
>>> pickle.dumps(x.y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/exarkun/Projects/python/branches/py3k/Lib/pickle.py", line 1314, in dumps
Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed It would be easy to fix this, though. Here's a link to some code that implements it: http://twistedmatrix.com/trac/browser/trunk/twisted/persisted/styles.py?rev=1 |
Not a proposed solution, but food for thought. Methods do have __reduce_ex__ method which works with protocol 3: >>> class X:
... def f(self):
... pass
>>> X.f.__reduce_ex__(3)
(<function __newobj__ at 0x100579288>, (<class 'function'>,), {}, None, None) This result is useless for several reasons:
>>> import builtins, types
>>> builtins.function = types.FunctionType
>>> pickle.dumps(X.f)
b'\x80\x03cbuiltins\nfunction\nq\x00)\x81q\x01}q\x02b.' but the result is useless: >>> pickle.loads(_)
Traceback (most recent call last):
..
File "Lib/pickle.py", line 1317, in loads
encoding=encoding, errors=errors).load()
TypeError: Required argument 'code' (pos 1) not found I think the approach of pickling the name of the function as is done in the Twisted link above is the only reasonable one and is consistent with the way module level functions are pickled. |
Note that pickle deliberately does not support serializing code objects. This is a security feature and should not be broken ! If you need to pickle such objects, you can easily register handlers that take care of this. |
Can you explain this? I don't think I agree, since an attacker can always serialize whatever they feel like. It's the person doing the deserialization that has to be careful. |
Jean-Paul Calderone wrote:
The marshal protocol which is used for storing PYC files has support The support on pickles, which are meant for data serialization, was not added By adding default support for unpickling code objects, you can trick |
This doesn't sound correct to me. You can *already* trick unpickling code into executing serialized code. You don't need this feature in order to be able to do it. |
Jean-Paul Calderone wrote:
How ? |
For example: exarkun@boson:~$ python
Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class x(object):
... def __reduce__(self):
... import os
... return os.system, ('echo "Hello from sploitland"',)
...
>>> import pickle
>>> pickle.loads(pickle.dumps(x()))
Hello from sploitland
0
>>> |
On Mon, Aug 2, 2010 at 9:25 AM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
>> You can *already* trick unpickling code into executing serialized code. You don't need
> this feature in order to be able to do it.
>
> How ?
>
>>> from pickle import *
>>> class evil:
... def __reduce__(self):
... return (exec, ("print('pwned!')",))
...
>>> s = dumps(evil())
>>> loads(s)
pwned! See also http://bugs.python.org/issue9120#msg109004 . AFAICT, the reason functions and classes are pickled by name has """Similarly, when class instances are pickled, their class’s code and |
Jean-Paul Calderone wrote:
>
> Jean-Paul Calderone <exarkun@twistedmatrix.com> added the comment:
>
> For example:
>
> exarkun@boson:~$ python
> Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> class x(object):
> ... def __reduce__(self):
> ... import os
> ... return os.system, ('echo "Hello from sploitland"',)
> ...
>>>> import pickle
>>>> pickle.loads(pickle.dumps(x()))
> Hello from sploitland
> 0 But here you are not transferring malicious code in the pickle Without the definition of class x on the receiving side, there By adding support for pickling code objects, you'd make it possible |
On Mon, Aug 2, 2010 at 10:05 AM, Marc-Andre Lemburg
You are mistaken. Try adding del x (or del evil in my example) |
I think methods should be picklable just like global functions are, that is, by pickling a tuple of the fully-qualified class name ("mymodule.myclass"), method name ("mymethod"), and self. |
M.-A. Lemburg wrote:
> Jean-Paul Calderone wrote:
>>
>> Jean-Paul Calderone <exarkun@twistedmatrix.com> added the comment:
>>
>> For example:
>>
>> exarkun@boson:~$ python
>> Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)
>> [GCC 4.4.1] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> class x(object):
>> ... def __reduce__(self):
>> ... import os
>> ... return os.system, ('echo "Hello from sploitland"',)
>> ...
>>>>> import pickle
>>>>> pickle.loads(pickle.dumps(x()))
>> Hello from sploitland
>> 0
>
> But here you are not transferring malicious code in the pickle
> string, you are just triggering the execution of such code that
> you already have (and are in control of).
>
> Without the definition of class x on the receiving side, there
> would be no exploit.
>
> By adding support for pickling code objects, you'd make it possible
> to place the definition of class x into the pickle string and
> you would no longer be in control of that code. Hmm, I just tried the code and it seems that you're right: The pickle string does not contain a reference to class x, ... def __reduce__(self):
... import os
... return os.system, ('echo "Bingo"',)
...
>>> import pickle
>>> pickle.dumps(C())
'cposix\nsystem\np0\n(S\'echo "Bingo"\'\np1\ntp2\nRp3\n.'
>>> C = None
>>> s = 'cposix\nsystem\np0\n(S\'echo "Bingo"\'\np1\ntp2\nRp3\n.'
>>> pickle.loads(s)
Bingo
0 |
On Mon, Aug 2, 2010 at 10:11 AM, Marc-Andre Lemburg
That's why we have a big red """ in the docs. |
Alexander Belopolsky wrote:
Good :-) I've never used .__reduce__() and wasn't aware of the I also like Antoine's idea of pickling the function/method name This is in line with PEP-307 (http://www.python.org/dev/peps/pep-0307/) |
I like it too. That's why I suggested it in the first comment on the ticket (read the linked code). I guess Alexander likes it too, since he basically said as much in the second comment. ;) |
There's already bpo-558238 on the same topic. |
On Mon, Aug 2, 2010 at 10:32 AM, Jean-Paul Calderone
Yes, I think we have a consensus on this point. Note, however that |
I suppose only bound methods should be pickleable: >>> class C:
... def m(self): pass
...
>>> c = C()
>>> c.m
<bound method C.m of <__main__.C object at 0x7fa81299b150>>
>>> c.m.__self__.__module__
'__main__' And perhaps class methods too: >>> class C:
... @classmethod
... def cm(self): pass
...
>>> C.cm
<bound method type.cm of <class '__main__.C'>>
>>> C.cm.__self__
<class '__main__.C'>
>>> C.cm.__self__.__module__
'__main__'
As we want, but they needn't be. |
This is a rather sad loss of functionality. |
The security issue mentioned previously has been known for years. And, it is easy to protect against. See http://docs.python.org/py3k/library/pickle.html#restricting-globals Also I am against adding pickling support to code objects. Code objects have no backward-compatibility constraint unlike pickles. Antoine is right about we should be using a method fully-qualified name to pickle it. However, the problem with this approach is a method doesn't always have fully-qualified name (see bpo-3657). ForkingPickler in Lib/multiprocessing/forking.py uses this approach to add pickling support to methods. |
I also miss being able to pickle unbound methods on Python 3. I don't think there's an interest in pickling the actual code objects. In my opinion, unbound methods should be pickled exactly like all the other Python definitions, such as bound methods, top-level functions, and classes: They should be pickled by name. IIUC, the challenge is how to figure out on which class an unbound method is defined. I'm using the term "unbound method" colloquially, I know it's implemented as a function. So perhaps Python needs to be changed to give unbound methods some attribute that will tell on which class they're defined? |
Okay, as an initial suggestion, how about we give every function a |
Won't work. The same function can be used in multiple classes. The function object is independent of the class. This is conceptually no different that the unremarkable fact that any object can be stored in multiple dictionaries and the object is not responsible for knowing which dictionaries it is stored in. def f(self): ... # not even defined inside a class
A.f = f # stored in class A
B.f = f # also stored in class B
dir(f) # f doesn't know where it is stored |
Raymond: I don't think this matters. We don't need a canonical |
This isn't worth introducing poorly thought out hacks. |
Being able to pickle unbound methods is important. In my project I have objects that refer to unbound methods. Now these objects are unpickleable. I can't save them to disk and I can't use the multiprocessing module on them. That's a big problem. |
Judging from all that has been said on this issue, I think the best you can do now is try to whip up a patch and upload it if it ends up not too hackish. |
To have a non-hackish patch we need a non-hackish idea. The I mean, a function's |
Not much, agreed. You might want to call it But by "hackish" I meant the possible implementation of it, not the idea |
The multiprocessing module *can* pickle bound and unbound methods (see below), but only with the multiprocessing.Process class. It does not work with Pool.map(), for example. The reason is that Process uses the special ForkingPickler that has special code to handle methods. Pool.map could be fixed IMO. Is "ForkingPickler" enough for your needs? ==== mod.py ============ class C:
def foo(self):
print("CALLED")
==== main.py ===========
from mod import C
if __name__ == '__main__':
from multiprocessing import Process
p = Process(target=C().foo)
p.start(); p.join()
p = Process(target=C.foo, args=(C(),))
p.start(); p.join() |
Amaury: I don't think ForkingPickler works for unbound methods defined in user code, which are implemented as functions. I think it only works for method-descriptors and wrapper-descriptors. |
did you see my example above? it passes methods defined in user code. |
Amaury: Your example succeeds on Linux but fails on Windows: $ python3.2 main.py
CALLED
Traceback (most recent call last):
File "C:\Python32\Lib\pickle.py", line 679, in save_global
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'foo'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 7, in <module>
p.start(); p.join()
File "C:\Python32\Lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python32\Lib\multiprocessing\forking.py", line 267, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python32\Lib\multiprocessing\forking.py", line 190, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Python32\Lib\pickle.py", line 237, in dump
self.save(obj)
File "C:\Python32\Lib\pickle.py", line 344, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python32\Lib\pickle.py", line 432, in save_reduce
save(state)
File "C:\Python32\Lib\pickle.py", line 299, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python32\Lib\pickle.py", line 623, in save_dict
self._batch_setitems(obj.items())
File "C:\Python32\Lib\pickle.py", line 656, in _batch_setitems
save(v)
File "C:\Python32\Lib\pickle.py", line 299, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python32\Lib\pickle.py", line 683, in save_global
(obj, module, name))
_pickle.PicklingError: Can't pickle <function foo at 0x00C4EBB8>: it's not found as mod.foo
User@TURING ~/Desktop/temp
$ Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python32\Lib\multiprocessing\forking.py", line 370, in main
self = load(from_parent)
EOFError |
I think the difference has to do with Python 3 vs. Python 2. |
OK, let's go back to the "__namespace__" idea, then. A long time ago I had the idea that the ast compiler could remember the list of "named blocks" (classes, functions) with their line numbers; |
I don't have the time and the ability to write the patch that implements this. I'll be happy to write tests if you think this will help. |
As part of the implementation of PEP-3154 (Pickle protocol 4), I've introduced support for pickling methods for all pickle protocols (and not just for the new protocol 4). This was implemented by adding the appropriate __reduce__ method on built-in functions and methods. In addition, pickling methods in nested classes is now supported by protocol 4 through the use of the __qualname__ attribute. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: