Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickling uses __class__ so you can't pickle proxy/mock objects that pretend to be other objects #58782

Closed
voidspace opened this issue Apr 14, 2012 · 6 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@voidspace
Copy link
Contributor

BPO 14577
Nosy @ncoghlan, @pitrou, @avassalotti, @voidspace

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-04-14.11:42:57.782>
created_at = <Date 2012-04-14.11:16:53.672>
labels = ['type-feature', 'library']
title = "pickling uses __class__ so you can't pickle proxy/mock objects that pretend to be other objects"
updated_at = <Date 2012-04-14.11:52:58.418>
user = 'https://github.com/voidspace'

bugs.python.org fields:

activity = <Date 2012-04-14.11:52:58.418>
actor = 'michael.foord'
assignee = 'none'
closed = True
closed_date = <Date 2012-04-14.11:42:57.782>
closer = 'michael.foord'
components = ['Library (Lib)']
creation = <Date 2012-04-14.11:16:53.672>
creator = 'michael.foord'
dependencies = []
files = []
hgrepos = []
issue_num = 14577
keywords = []
message_count = 6.0
messages = ['158254', '158255', '158257', '158258', '158259', '158260']
nosy_count = 4.0
nosy_names = ['ncoghlan', 'pitrou', 'alexandre.vassalotti', 'michael.foord']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue14577'
versions = ['Python 3.3']

@voidspace
Copy link
Contributor Author

Pickling uses __class__ instead of type(obj) to determine the type to pickle. This means that objects which pretend to be other objects (like proxy and mock objects) can't be pickled correctly:

>>> class Foo(object):
...  __class__ = property(lambda *a, **k: int)
...  
... 
>>> from pickle import dumps
>>> dumps(Foo())
b'\x80\x03cbuiltins\nint\nq\x00)\x81q\x01}q\x02b.'

Here Foo() is pickled as an int. In Python 2 using type(obj) wouldn't work for old style classes, but I don't see a reason not to use type(obj) in Python 3.

Note that also, the pickle protocol methods like __getstate__ etc are looked up on the object instance (using getattr) rather than on the object type. This means that __getattr__ is invoked to look them up - requiring special casing in objects that provide __getattr__ to avoid them (raise an AttributeError if they don't provide these methods). This affects three object types in the unittest.mock namespace (_Call, sentinel and the Mock variants).

@pitrou
Copy link
Member

pitrou commented Apr 14, 2012

Have you tried making a change and see if any tests fail? This is a behaviour change and I wonder if the original behaviour is by design or accident.

@pitrou pitrou added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 14, 2012
@voidspace
Copy link
Contributor Author

So, changing copyreg.py to use type(self) instead of self.__class__ isn't sufficient. _pickle accesses __class__ as well it seems. However I'm running all tests with this change in place to see if it breaks intended behaviour:

Python 3.3.0a1+ (default:51016ff7f8c9, Mar 26 2012, 13:15:33) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle as p
[70240 refs]
>>> class Foo(object): 
...  __class__ = property(lambda s: int)
... 
[70290 refs]
>>> Foo().__class__
<class 'int'>
[70294 refs]
>>> p.dumps(Foo())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class

@voidspace
Copy link
Contributor Author

test_pickle still passes with only copyreg.py modified.

With one additional change in pickle.py (line 405 to use type(obj) instead of obj.__class__) the pickling works as I would hope (I would need assistance to fix _pickle):

>>> import sys
[65446 refs]
>>> sys.modules['_pickle'] = None
[65448 refs]
>>> import pickle as p
[70090 refs]
>>> class Foo:    
...  __class__ = property(lambda s: int)
... 
[70140 refs]
>>> p.dumps(Foo())
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
[70156 refs]
>>> d = p.dumps(Foo())
[70158 refs]
>>> p.loads(d)
<__main__.Foo object at 0x101410a00>
[70216 refs]
>>> f = p.loads(d)
[70219 refs]
>>> f.__class__
<class 'int'>

However, as you suspect Antoine, it is apparently deliberate that proxies should be pickled as the original:

======================================================================
ERROR: test_newobj_proxies (test.test_pickle.DumpPickle_CLoadPickle)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/compile/py3k-cpython/Lib/test/pickletester.py", line 903, in test_newobj_proxies
    s = self.dumps(p, proto)
  File "/compile/py3k-cpython/Lib/test/test_pickle.py", line 33, in dumps
    p.dump(arg)
  File "/compile/py3k-cpython/Lib/pickle.py", line 235, in dump
    self.save(obj)
  File "/compile/py3k-cpython/Lib/pickle.py", line 342, in save
    self.save_reduce(obj=obj, *rv)
  File "/compile/py3k-cpython/Lib/pickle.py", line 405, in save_reduce
    "args[0] from __newobj__ args has the wrong class")
_pickle.PicklingError: args[0] from __newobj__ args has the wrong class

Line 891 from pickletester.py:

    def test_newobj_proxies(self):
        # NEWOBJ should use the __class__ rather than the raw type

I wonder what the use case for that is? If you serialize a proxy object, why would the deserialization code not want a proxy back too?

I guess I can look at implementing copyreg functions for my objects instead.

@ncoghlan
Copy link
Contributor

Some additional thoughts for anyone else that comes across this issue.

Consider the case of a weakref proxy (the only proxy type in the stdlib): for that, you never want to serialise the proxy, you want to serialise the original object.

To correctly serialise a proxy object, you have to somehow ensure:

  • the proxy gets serialised
  • the target gets serialised
  • the two get hooked up again at the far end

Now consider what happens if you have *two* proxies both pointing at the same target: how do you ensure that, when deserialised, both proxies still share a target?

In the general case, you can't - so pickle doesn't even try. Instead, serialising a proxy serialises the original object - if you want to do something else, you need to decide for yourself how to recreate the cross-references correctly on deserialisation.

@voidspace
Copy link
Contributor Author

Nick - in general proxy objects have a *reference* to their target (weakref being somewhat of a special case), and pickle can already handle multiple references to the same target when deserializing an object graph. So I don't see that argument holding water for the general case.

I also challenge the assertion that for a weakref proxy "you never want to serialise the proxy, you want to serialise the original object". Why? If the weakref proxy could be serialized and deserialized including the target, why would you *not* want a weakref proxy back (on its own there isn't much use for that but as part of an object graph having serialization turn a weakref into a strong reference sounds like an anti-feature).

However, it looks like implementing __reduce__ is sufficient for my use case. Although the special-method-lookup-on-instance is still a nuisance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants