classification
Title: Make hash(None) consistent among processes
Type: enhancement Stage: needs patch
Components: Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, hongqn, pitrou, rhettinger, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2013-10-11 04:03 by hongqn, last changed 2013-10-12 17:18 by christian.heimes. This issue is now closed.

Files
File name Uploaded Description Edit
hash_of_none.patch (deprecated) hongqn, 2013-10-11 04:03 make hash(None) always return 0
hash_of_none.patch hongqn, 2013-10-11 05:54 make hash(None) always return 1315925605
Messages (10)
msg199439 - (view) Author: Qiangning Hong (hongqn) Date: 2013-10-11 04:03
Integers, strings, and bool's hash are all consistent for processes of a same interpreter.  However, hash(None) differs.

$ python -c "print(hash(None))"
272931276
$ python -c "print(hash(None))"
277161420

It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None.

This patch makes hash(None) always return 0 to resolve that problem.  And it is used in DPark(Python clone of Spark, a MapReduce alike framework in Python, https://github.com/douban/dpark) to speed up portable hash (see line https://github.com/douban/dpark/blob/65a3ba857f11285667c61e2e134dacda44c13a2c/dpark/util.py#L47).

davies.liu@gmail.com is the original author of this patch.  All credit goes to him.
msg199440 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-11 04:51
Instead of 0, pick some large random number that is less likely to collide with other hashes such as hash(0).
msg199441 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-11 05:06
How about

>>> (78 << 24) + (111 << 16) + (110 << 8) + 101
1315925605

The output of hash() is not guaranteed to be consistent between processes. The outcome depends on the hash randomization key, architecture, platform, Python version and perhaps other flags. 32bit builds of Python generated different hash() values than 64bit. The value might depend on endianess, too. (Not sure about that)
msg199442 - (view) Author: Qiangning Hong (hongqn) Date: 2013-10-11 05:54
Return 1315925605 now :)
msg199448 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-11 09:09
Is this something we actually want to support officially? Many other types have non-repeatable hashes, e.g.:

$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8771754605115
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8791504743739
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8788491320379
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8792628055611
msg199452 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-11 09:19
In the same Python version, hash(None) always give me the same value. I cannot reproduced your issue on Linux, I tested Python 2.7, 3.3 and 3.4.

$ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026

$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465

$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812
msg199453 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-11 09:20
"It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None."

How you handle the randomization of hash(str)? (python2.7 -R, enabled by default in Python 3.3).
msg199538 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-10-12 04:51
-0.

Since hash(None) is currently based on None's memory address, I appreciate that it's not reliable (e.g., use different releases of the same compiler to build Python, and hash(None) may be different between them).

The docs guarantee little about hash() results, so applications relying on cross-machine - or even same-machine cross-run - consistency are broken.

It's trivial code bloat to special-case None, but it leaves a world of other hash() behaviors as-is (essentially "undefined").  The `portable_hash()` function in the DPark source is a start at what needs to be done if an application wants reliable hashes.  But it's just a start (e.g., it's apparently relying on cross-platform consistency for `hash(integer)` and `hash(string)`, etc).

Since CPython will never promise to make _all_ of those consistent across platforms and releases, I'd rather not even start down that road.   Making the promise for `hash(None)` would be an attractive nuisance.
msg199603 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-12 17:11
There seems to be a pretty good consensus that this is something we don't want to support.
msg199606 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-12 17:18
Tim has convinced me, too.
History
Date User Action Args
2013-10-12 17:18:48christian.heimessetmessages: + msg199606
2013-10-12 17:11:43rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg199603
2013-10-12 04:51:34tim.peterssetmessages: + msg199538
2013-10-11 09:20:56vstinnersetmessages: + msg199453
2013-10-11 09:19:15vstinnersetnosy: + vstinner
messages: + msg199452
2013-10-11 09:09:36pitrousetnosy: + pitrou, tim.peters
messages: + msg199448
2013-10-11 05:54:35hongqnsetfiles: + hash_of_none.patch

messages: + msg199442
2013-10-11 05:06:31christian.heimessetversions: + Python 3.3, Python 3.4
nosy: + christian.heimes

messages: + msg199441

type: enhancement
stage: needs patch
2013-10-11 04:51:49rhettingersetnosy: + rhettinger
messages: + msg199440
2013-10-11 04:03:59hongqnsetfiles: + hash_of_none.patch (deprecated)
keywords: + patch
2013-10-11 04:03:40hongqncreate