New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pickle special methods are looked up on the instance rather than the type #60455
Comments
In Objects/typeobject.c, reduce_2() makes _PyObject_GetAttrId() calls to pull some methods for the object in question. This is a problem when the object's type defines __getattr__() or __getattribute__() and returns something like None when the attribute is not found: from copy import deepcopy
class Spam(dict):
def __getattr__(self, name):
# defaults to None
return self.get(name)
deepcopy(Spam(ham=5)) While we could do a check on the result of _PyObject_GetAttrId() to make sure it is callable, the better solution would be to look up the methods on the object's type rather than on the object. Doing so falls in line with the specified pattern for special method lookup [1]. I'm guessing there are a few other related places in typeobject.c that have the same problem which could be fixed in the same way. I'm marking this as a bug rather than a feature since it is a deviation from the specification for special methods. [1] http://docs.python.org/dev/reference/datamodel.html#special-method-lookup |
pickle and _pickle also look on the instance for __reduce__ and __reduce_ex__. |
It may still break some existing code, though (especially for people using proxies). |
I agree this is a significant risk. |
Agreed. |
I've posted to python-dev about this: http://mail.python.org/pipermail/python-dev/2013-February/124114.html At the very least the pickle docs deserve some mention of the behavior. That way people won't need to spend as much time trying to figure out why things aren't working the way they expect. However, my preference is to fix pickle, though I readily concede that might not be possible. And just to reiterate, I agree we shouldn't do anything here without a clear picture of the impact. |
A backward compatible solution would be to do lookup on the class after trying the instance (and that came back None or isn't callable). |
I think you are overreacting. The rule you are talking about (special methods are only looked up on the class) is really known by experts, not so much by common Python programmers. I'd argue that pickle doesn't break normal users' expectations here. |
Yeah, maybe so. :) I just found the problem I ran into to be really hard to diagnose due to how pickle does lookup, and figured it would be worth addressing to save someone else the same headache later. |
I ran into this recently while porting nose to 3.4. I duped http://bugs.python.org/issue20261 to this issue, but note that in http://bugs.python.org/issue20261#msg208109 I notice that the presence or absence of __getnewargs__() regresses the specific behavior in Python 3.4. |
Genshi is affected by the 3.4 regression too (it has a class that defines __getnewargs__ and __getattr__). |
I have an ugly work-around for those affected: def __getattr__(self, name):
# work around for pickle bug in Python 3.4
# see http://bugs.python.org/issue16251
if name == "__getnewargs_ex__":
raise AttributeError("%r has no attribute %r" % (type(self), name))
... |
New changeset b328f8ccbccf by Benjamin Peterson in branch 'default': |
New changeset 2514a577c7cb by Benjamin Peterson in branch '3.4': |
Is this issue fixed? |
I don't think so. On Tue, Sep 22, 2015, at 11:52, Serhiy Storchaka wrote:
|
What the problem tries to solve PR 3508? Swallowing all exceptions looks like an antipattern to me. This puts the problem under the carpet. Rather than failing and allowing the programmer to fix the picleability of its class, this can silently produce incorrect pickle data. By swallowing MemoryError and RecursionError this changes the behavior of objects in an environment with limited resources: lack of memory or calling deep in the calling stack. This adds heisenbugs. An infinity recursion spends CPU time, and in unlucky cases can cause a crash. It is better to detect it earlier than just swallow RecursionError. |
The two test cases added demonstrate what was impossible to pickle before and is now.
This is the only thing that we can do faced with custom
Can you give me an example where this would lead to incorrect pickle data?
This is what |
These two classes obviously are not pickleable. Pickling or copying them is a programming error. I believe the proper way of making them pickleable and copyable is implementing corresponding special methods.
Any class that needs non-trivial __getstate__(). If pickling is failed now it means that the class is not pickleable. With your patch pickling can be successful, but there is no guarantee that it is correct. For example, modifying one of your example: class Proxy:
def __init__(self, proxied_object):
self.proxied_object = proxied_object
self.id = id(proxied_object)
def __getattr__(self, name):
return getattr(self.proxied_object, name)
hasattr() is broken in Python 2. It was fixed in Python 3. Your patch reintroduces similar bug in copy. |
The recursion case was successfully deep copying under Python 2. When migrating the code, the sudden recursion error is pretty hard to debug.
Right, but during migration from Python 2 to Python 3 the error message is nowhere close suggesting what the correct behavior is.
My patch doesn't change anything here. It simply causes special methods to be reported as not present instead of raising exceptions from within __getattr__.
Yes, I agree.
No, my patch introduces a getattr() equivalent for deepcopy which is robust against random exceptions raised from __getattr__. ------------ Let's take a step back. My goal with this patch is two-fold:
Clearly the current workaround I wrote doesn't go in the direction you'd like to see. Do you have a suggestion of an approach that would be better? I can implement it. |
Python 3 is not compatible with Python 2 because it fixes many bugs that can't The problem with copying is not the only problem with that class. It also has
I think a traceback should help to identify the problem. At least it looks
And this is a regression change. Instead of not producing incorrect data, the Yet one drawback is a performance regression. Additional checks slow down a
The code shouldn't be tolerant to random exceptions raised from __getattr__.
Raising an exception is a feature of Python 3, not a bug. It helps to catch
The current behavior LGTM. I don't think it needs to be improved. |
FWIW I withdrew my PR. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: