New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hash(None) consistent among processes #63423
Comments
Integers, strings, and bool's hash are all consistent for processes of a same interpreter. However, hash(None) differs. $ python -c "print(hash(None))"
272931276
$ python -c "print(hash(None))"
277161420 It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None. This patch makes hash(None) always return 0 to resolve that problem. And it is used in DPark(Python clone of Spark, a MapReduce alike framework in Python, https://github.com/douban/dpark) to speed up portable hash (see line https://github.com/douban/dpark/blob/65a3ba857f11285667c61e2e134dacda44c13a2c/dpark/util.py#L47). davies.liu@gmail.com is the original author of this patch. All credit goes to him. |
Instead of 0, pick some large random number that is less likely to collide with other hashes such as hash(0). |
How about >>> (78 << 24) + (111 << 16) + (110 << 8) + 101
1315925605 The output of hash() is not guaranteed to be consistent between processes. The outcome depends on the hash randomization key, architecture, platform, Python version and perhaps other flags. 32bit builds of Python generated different hash() values than 64bit. The value might depend on endianess, too. (Not sure about that) |
Return 1315925605 now :) |
Is this something we actually want to support officially? Many other types have non-repeatable hashes, e.g.: $ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8771754605115
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8791504743739
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8788491320379
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8792628055611 |
In the same Python version, hash(None) always give me the same value. I cannot reproduced your issue on Linux, I tested Python 2.7, 3.3 and 3.4. $ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026
$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465
$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812 |
"It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None." How you handle the randomization of hash(str)? (python2.7 -R, enabled by default in Python 3.3). |
-0. Since hash(None) is currently based on None's memory address, I appreciate that it's not reliable (e.g., use different releases of the same compiler to build Python, and hash(None) may be different between them). The docs guarantee little about hash() results, so applications relying on cross-machine - or even same-machine cross-run - consistency are broken. It's trivial code bloat to special-case None, but it leaves a world of other hash() behaviors as-is (essentially "undefined"). The Since CPython will never promise to make all of those consistent across platforms and releases, I'd rather not even start down that road. Making the promise for |
There seems to be a pretty good consensus that this is something we don't want to support. |
Tim has convinced me, too. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: