This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pickling uses __class__ so you can't pickle proxy/mock objects that pretend to be other objects
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, michael.foord, ncoghlan, pitrou
Priority: normal Keywords:

Created on 2012-04-14 11:16 by michael.foord, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg158254 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2012-04-14 11:16
Pickling uses __class__ instead of type(obj) to determine the type to pickle. This means that objects which pretend to be other objects (like proxy and mock objects) can't be pickled correctly:

>>> class Foo(object):
...  __class__ = property(lambda *a, **k: int)
...  
... 
>>> from pickle import dumps
>>> dumps(Foo())
b'\x80\x03cbuiltins\nint\nq\x00)\x81q\x01}q\x02b.'

Here Foo() is pickled as an int. In Python 2 using type(obj) wouldn't work for old style classes, but I don't see a reason not to use type(obj) in Python 3.

Note that also, the pickle protocol methods like __getstate__ etc are looked up on the object instance (using getattr) rather than on the object type. This means that __getattr__ is invoked to look them up - requiring special casing in objects that provide __getattr__ to avoid them (raise  an AttributeError if they don't provide these methods). This affects three object types in the unittest.mock namespace (_Call, sentinel and the Mock variants).
msg158255 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-14 11:20
Have you tried making a change and see if any tests fail? This is a behaviour change and I wonder if the original behaviour is by design or accident.
msg158257 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2012-04-14 11:28
So, changing copyreg.py to use type(self) instead of self.__class__ isn't sufficient. _pickle accesses __class__ as well it seems. However I'm running all tests with this change in place to see if it breaks intended behaviour:


Python 3.3.0a1+ (default:51016ff7f8c9, Mar 26 2012, 13:15:33) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle as p
[70240 refs]
>>> class Foo(object): 
...  __class__ = property(lambda s: int)
... 
[70290 refs]
>>> Foo().__class__
<class 'int'>
[70294 refs]
>>> p.dumps(Foo())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class
msg158258 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2012-04-14 11:40
test_pickle still passes with only copyreg.py modified. 

With one additional change in pickle.py (line 405 to use type(obj) instead of obj.__class__) the pickling works as I would hope (I would need assistance to fix _pickle):

>>> import sys
[65446 refs]
>>> sys.modules['_pickle'] = None
[65448 refs]
>>> import pickle as p
[70090 refs]
>>> class Foo:    
...  __class__ = property(lambda s: int)
... 
[70140 refs]
>>> p.dumps(Foo())
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
[70156 refs]
>>> d = p.dumps(Foo())
[70158 refs]
>>> p.loads(d)
<__main__.Foo object at 0x101410a00>
[70216 refs]
>>> f = p.loads(d)
[70219 refs]
>>> f.__class__
<class 'int'>


However, as you suspect Antoine, it is apparently deliberate that proxies should be pickled as the original:

======================================================================
ERROR: test_newobj_proxies (test.test_pickle.DumpPickle_CLoadPickle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/compile/py3k-cpython/Lib/test/pickletester.py", line 903, in test_newobj_proxies
    s = self.dumps(p, proto)
  File "/compile/py3k-cpython/Lib/test/test_pickle.py", line 33, in dumps
    p.dump(arg)
  File "/compile/py3k-cpython/Lib/pickle.py", line 235, in dump
    self.save(obj)
  File "/compile/py3k-cpython/Lib/pickle.py", line 342, in save
    self.save_reduce(obj=obj, *rv)
  File "/compile/py3k-cpython/Lib/pickle.py", line 405, in save_reduce
    "args[0] from __newobj__ args has the wrong class")
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class

----------------------------------------------------------------------


Line 891 from pickletester.py: 


    def test_newobj_proxies(self):
        # NEWOBJ should use the __class__ rather than the raw type

I wonder what the use case for that is? If you serialize a proxy object, why would the deserialization code not want a proxy back too?

I guess I can look at implementing copyreg functions for my objects instead.
msg158259 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-04-14 11:48
Some additional thoughts for anyone else that comes across this issue.

Consider the case of a weakref proxy (the only proxy type in the stdlib): for that, you never want to serialise the proxy, you want to serialise the original object.

To correctly serialise a proxy object, you have to somehow ensure:
- the proxy gets serialised
- the target gets serialised
- the two get hooked up again at the far end

Now consider what happens if you have *two* proxies both pointing at the same target: how do you ensure that, when deserialised, both proxies still share a target?

In the general case, you can't - so pickle doesn't even try. Instead, serialising a proxy serialises the original object - if you want to do something else, you need to decide for yourself how to recreate the cross-references correctly on deserialisation.
msg158260 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2012-04-14 11:52
Nick - in general proxy objects have a *reference* to their target (weakref being somewhat of a special case), and pickle can already handle multiple references to the same target when deserializing an object graph. So I don't see that argument holding water for the general case.

I also challenge the assertion that for a weakref proxy "you never want to serialise the proxy, you want to serialise the original object". Why? If the weakref proxy could be serialized and deserialized including the target, why would you *not* want a weakref proxy back (on its own there isn't much use for that but as part of an object graph having serialization turn a weakref into a strong reference sounds like an anti-feature).

However, it looks like implementing __reduce__ is sufficient for my use case. Although the special-method-lookup-on-instance is still a nuisance.
History
Date User Action Args
2022-04-11 14:57:29adminsetgithub: 58782
2012-04-14 11:52:58michael.foordsetmessages: + msg158260
2012-04-14 11:48:58ncoghlansetnosy: + ncoghlan
messages: + msg158259
2012-04-14 11:42:57michael.foordsetstatus: open -> closed
resolution: wont fix
stage: resolved
2012-04-14 11:40:06michael.foordsetmessages: + msg158258
2012-04-14 11:28:27michael.foordsetmessages: + msg158257
2012-04-14 11:20:25pitrousetversions: + Python 3.3
nosy: + alexandre.vassalotti, pitrou

messages: + msg158255

components: + Library (Lib)
type: enhancement
2012-04-14 11:16:53michael.foordcreate