msg356382 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2019-11-11 19:40 |
We recently found a bug in one of our codebases that looked ~roughly like this:
class C:
...
def __hash__(self):
return hash((v for k, v in sorted(self.__dict__.items())))
which resulted in a production bug
The *intention* was to hash a tuple of those elements but the author had forgotten the `tuple(...)` call (or incorrectly assumed a parenthesized generator was a tuple comprehension) -- either way it seems wrong that generators are currently hashable as they are mutable
Thoughts on `__hash__ = None` for generators?
|
msg356383 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-11-11 20:21 |
> the author had forgotten the `tuple(...)` call (or incorrectly
> assumed a parenthesized generator was a tuple comprehension)
Presumably that will bite the author in many ways, not just hashability. There are many other places where substituting a generator for a tuple would result in a downstream error. ISTM, this user error would have been caught with even minimal testing or code review.
> it seems wrong that generators are currently hashable as they are mutable
It is perfectly reasonable given that generators compare based on object identity.
> Thoughts on `__hash__ = None` for generators?
That would break currently working code that depends on hashability. I have seen such code in production more than once (using distinct generator instances as dictionary keys for example).
|
msg356387 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2019-11-11 21:06 |
:shrugs: it did go through a round of code review but was simply missed -- the testing bit is fair
there are other such mutable objects in python which do define __hash__ = None such as list and dict
|
msg356389 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-11-11 21:21 |
> there are other such mutable objects in python which do define
> __hash__ = None such as list and dict
Those objects compare on their contents, not on object identity.
Take a look at how hashing works in dataclasses. You can have a mix of mutable and immutable fields as long as both __eq__ and __hash__ depend on only the immutable fields.
Also, take a look at database cursor objects. Both __eq__ and __hash__ use object identity even though fetchone() advances the internal state (just like an iterator).
Anyway, the breaking of existing working code makes this a non-starter.
|
msg356392 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2019-11-11 21:50 |
I agree with Raymond. By default all objects are compared by identity and are hashable. User classes are hashable, files are hashable, iterators are hashable. Generator objects are just not special enough to be non-hashable.
We can reconsider this. But it a part of much larger Python language change. It should be discussed on Python-Dev. We should estimate the benefit and the amount of possible breakage. I think there is a chance of changing this, but we need more information.
|
msg356393 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-11-11 22:02 |
Function objects can be used as dict keys too. Given that, it would be _very_ surprising if generator-iterators weren't. I don't really see a good point to leaving this open. Live with it - it's a feature ;-)
|
msg356394 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2019-11-11 22:14 |
function objects are immutable though
|
msg356395 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-11-11 22:18 |
What makes you think generator-iterator objects are mutable?
|
msg356396 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2019-11-11 22:24 |
I think I'm missing what you're saying, apologies I'm probably confused :(
|
msg356397 - (view) |
Author: Steven D'Aprano (steven.daprano) *  |
Date: 2019-11-11 22:35 |
Anthony, can you please explain what you mean when you describe generators as "mutable"? I don't understand what you mean.
To *me*, the value of a generator, in so far as comparisons goes, is its identity, not its invisible internal state. You can test that by noting that two generators with the same state compare unequal:
py> def gen():
... yield 1
...
py> a = gen()
py> b = gen()
py> a == b
False
Furthermore, the hash of the generator doesn't change when its internal state changes.
py> hash(a)
192456114
py> next(a)
1
py> hash(a)
192456114
And as for functions being "immutable", you can attach arbitrary attributes to them, so their object state can change. Does that make them mutable? Either way, it doesn't matter, since functions are also compared by identity. Their value is their identity, not their state. (Two functions with the same internal state are compared as unequal.)
|
msg356399 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2019-11-11 22:46 |
Yes, what Steven said. All kinds of functions (including, but not limited to, generator-iterators) are compared by object identity, nothing at all about internal state. The contract of hash() is that if a == b, then we must have that hash(a) == hash(b) too. That's got nothing to do with internal state either, _only_ with how __eq__ is implemented.
There is no sense in which any kind of function object is "mutable" that has any effect on its object identity, so also no sense in which it can affect __eq__ results, so no reason at all for __hash__ to care. It's all working as designed and as intended.
|
msg356400 - (view) |
Author: Ammar Askar (ammar2) *  |
Date: 2019-11-11 22:47 |
An easier way to understand might be to just write out the generator expression "manually":
def generator_expression():
for k, v in sorted(self.__dict__.items()):
yield k
return hash(my_generator_expression())
Would you say that `generator_expression` here is mutable or immutable? The code that underpins how it generates values is clearly immutable just like functions but the values it closes over can be changed.
Another minimal example:
>>> def f():
... yield from some_list
...
>>> some_list = [1,2,3]
>>> list(f())
[1, 2, 3]
>>> some_list = [1,2,3,4,5]
>>> list(f())
[1, 2, 3, 4, 5]
Would you say that `f` is immutable or mutable in this case?
|
msg356401 - (view) |
Author: Steven D'Aprano (steven.daprano) *  |
Date: 2019-11-11 22:58 |
I can see that making generators unhashable would have found your bug earlier, but it is otherwise unjustified: generators are just objects that are compared by identity. It would break other people's code, which is a very invasive change to make just so that a small subset of "forgot to call tuple" bugs might sometimes be found earlier.
Personally, I think that Serhiy's comment that there is "a chance" that this will be accepted is over-optimistic, but either way this would need to have a PEP.
I'm going to close this as "Not a bug" but that doesn't mean discussion has to stop. If you feel strongly enough about this issue to write a PEP proposing this change, please take the discussion to the Python-Dev mailing list with a summary and see if you can get somebody to sponsor the PEP. If you do, feel free to re-open as an Enhancement.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:23 | admin | set | github: 82950 |
2019-11-11 22:58:17 | steven.daprano | set | status: open -> closed resolution: not a bug messages:
+ msg356401
stage: resolved |
2019-11-11 22:47:47 | ammar2 | set | nosy:
+ ammar2 messages:
+ msg356400
|
2019-11-11 22:46:12 | tim.peters | set | messages:
+ msg356399 |
2019-11-11 22:35:33 | steven.daprano | set | nosy:
+ steven.daprano messages:
+ msg356397
|
2019-11-11 22:24:46 | Anthony Sottile | set | messages:
+ msg356396 |
2019-11-11 22:18:30 | tim.peters | set | messages:
+ msg356395 |
2019-11-11 22:14:19 | Anthony Sottile | set | messages:
+ msg356394 |
2019-11-11 22:02:49 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg356393
|
2019-11-11 21:50:53 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg356392
|
2019-11-11 21:21:22 | rhettinger | set | messages:
+ msg356389 |
2019-11-11 21:06:37 | Anthony Sottile | set | messages:
+ msg356387 |
2019-11-11 20:21:39 | rhettinger | set | messages:
+ msg356383 |
2019-11-11 19:44:22 | ammar2 | set | nosy:
+ rhettinger
|
2019-11-11 19:40:25 | Anthony Sottile | create | |