This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: generators are currently hashable
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Anthony Sottile, ammar2, rhettinger, serhiy.storchaka, steven.daprano, tim.peters
Priority: normal Keywords:

Created on 2019-11-11 19:40 by Anthony Sottile, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (13)
msg356382 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-11-11 19:40
We recently found a bug in one of our codebases that looked ~roughly like this:

class C:
    ...

    def __hash__(self):
        return hash((v for k, v in sorted(self.__dict__.items())))

which resulted in a production bug

The *intention* was to hash a tuple of those elements but the author had forgotten the `tuple(...)` call (or incorrectly assumed a parenthesized generator was a tuple comprehension) -- either way it seems wrong that generators are currently hashable as they are mutable

Thoughts on `__hash__ = None` for generators?
msg356383 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-11 20:21
> the author had forgotten the `tuple(...)` call (or incorrectly 
> assumed a parenthesized generator was a tuple comprehension)

Presumably that will bite the author in many ways, not just hashability.  There are many other places where substituting a generator for a tuple would result in a downstream error.  ISTM, this user error would have been caught with even minimal testing or code review.


> it seems wrong that generators are currently hashable as they are mutable

It is perfectly reasonable given that generators compare based on object identity.


> Thoughts on `__hash__ = None` for generators?

That would break currently working code that depends on hashability.  I have seen such code in production more than once (using distinct generator instances as dictionary keys for example).
msg356387 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-11-11 21:06
:shrugs: it did go through a round of code review but was simply missed -- the testing bit is fair

there are other such mutable objects in python which do define __hash__ = None such as list and dict
msg356389 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-11 21:21
> there are other such mutable objects in python which do define
> __hash__ = None such as list and dict

Those objects compare on their contents, not on object identity.

Take a look at how hashing works in dataclasses.  You can have a mix of mutable and immutable fields as long as both __eq__ and __hash__ depend on only the immutable fields.

Also, take a look at database cursor objects.  Both __eq__ and __hash__ use object identity even though fetchone() advances the internal state (just like an iterator).

Anyway, the breaking of existing working code makes this a non-starter.
msg356392 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-11 21:50
I agree with Raymond. By default all objects are compared by identity and are hashable. User classes are hashable, files are hashable, iterators are hashable. Generator objects are just not special enough to be non-hashable.

We can reconsider this. But it a part of much larger Python language change. It should be discussed on Python-Dev. We should estimate the benefit and the amount of possible breakage. I think there is a chance of changing this, but we need more information.
msg356393 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-11 22:02
Function objects can be used as dict keys too.  Given that, it would be _very_ surprising if generator-iterators weren't.  I don't really see a good point to leaving this open.  Live with it - it's a feature ;-)
msg356394 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-11-11 22:14
function objects are immutable though
msg356395 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-11 22:18
What makes you think generator-iterator objects are mutable?
msg356396 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2019-11-11 22:24
I think I'm missing what you're saying, apologies I'm probably confused :(
msg356397 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-11-11 22:35
Anthony, can you please explain what you mean when you describe generators as "mutable"? I don't understand what you mean.

To *me*, the value of a generator, in so far as comparisons goes, is its identity, not its invisible internal state. You can test that by noting that two generators with the same state compare unequal:

    py> def gen():
    ...     yield 1
    ...
    py> a = gen()
    py> b = gen()
    py> a == b
    False

Furthermore, the hash of the generator doesn't change when its internal state changes.

    py> hash(a)
    192456114
    py> next(a)
    1
    py> hash(a)
    192456114


And as for functions being "immutable", you can attach arbitrary attributes to them, so their object state can change. Does that make them mutable? Either way, it doesn't matter, since functions are also compared by identity. Their value is their identity, not their state. (Two functions with the same internal state are compared as unequal.)
msg356399 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-11 22:46
Yes, what Steven said.  All kinds of functions (including, but not limited to, generator-iterators) are compared by object identity, nothing at all about internal state.  The contract of hash() is that if a == b, then we must have that hash(a) == hash(b) too.  That's got nothing to do with internal state either, _only_ with how __eq__ is implemented.

There is no sense in which any kind of function object is "mutable" that has any effect on its object identity, so also no sense in which it can affect __eq__ results, so no reason at all for __hash__ to care.  It's all working as designed and as intended.
msg356400 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2019-11-11 22:47
An easier way to understand might be to just write out the generator expression "manually":

  def generator_expression():
    for k, v in sorted(self.__dict__.items()):
      yield k
  return hash(my_generator_expression())

Would you say that `generator_expression` here is mutable or immutable? The code that underpins how it generates values is clearly immutable just like functions but the values it closes over can be changed.

Another minimal example:

>>> def f():
...   yield from some_list
...
>>> some_list = [1,2,3]
>>> list(f())
[1, 2, 3]
>>> some_list = [1,2,3,4,5]
>>> list(f())
[1, 2, 3, 4, 5]

Would you say that `f` is immutable or mutable in this case?
msg356401 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-11-11 22:58
I can see that making generators unhashable would have found your bug earlier, but it is otherwise unjustified: generators are just objects that are compared by identity. It would break other people's code, which is a very invasive change to make just so that a small subset of "forgot to call tuple" bugs might sometimes be found earlier.

Personally, I think that Serhiy's comment that there is "a chance" that this will be accepted is over-optimistic, but either way this would need to have a PEP.

I'm going to close this as "Not a bug" but that doesn't mean discussion has to stop. If you feel strongly enough about this issue to write a PEP proposing this change, please take the discussion to the Python-Dev mailing list with a summary and see if you can get somebody to sponsor the PEP. If you do, feel free to re-open as an Enhancement.
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 82950
2019-11-11 22:58:17steven.dapranosetstatus: open -> closed
resolution: not a bug
messages: + msg356401

stage: resolved
2019-11-11 22:47:47ammar2setnosy: + ammar2
messages: + msg356400
2019-11-11 22:46:12tim.peterssetmessages: + msg356399
2019-11-11 22:35:33steven.dapranosetnosy: + steven.daprano
messages: + msg356397
2019-11-11 22:24:46Anthony Sottilesetmessages: + msg356396
2019-11-11 22:18:30tim.peterssetmessages: + msg356395
2019-11-11 22:14:19Anthony Sottilesetmessages: + msg356394
2019-11-11 22:02:49tim.peterssetnosy: + tim.peters
messages: + msg356393
2019-11-11 21:50:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg356392
2019-11-11 21:21:22rhettingersetmessages: + msg356389
2019-11-11 21:06:37Anthony Sottilesetmessages: + msg356387
2019-11-11 20:21:39rhettingersetmessages: + msg356383
2019-11-11 19:44:22ammar2setnosy: + rhettinger
2019-11-11 19:40:25Anthony Sottilecreate