Make hash(None) consistent among processes #63423

hongqn · 2013-10-11T04:03:41Z

BPO	19224
Nosy	@tim-one, @rhettinger, @pitrou, @vstinner, @tiran
Files	hash_of_none.patch (deprecated): make hash(None) always return 0 hash_of_none.patch: make hash(None) always return 1315925605

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-10-12.17:11:43.666>
created_at = <Date 2013-10-11.04:03:40.946>
labels = ['type-feature']
title = 'Make hash(None) consistent among processes'
updated_at = <Date 2013-10-12.17:18:48.989>
user = 'https://bugs.python.org/hongqn'

bugs.python.org fields:

activity = <Date 2013-10-12.17:18:48.989>
actor = 'christian.heimes'
assignee = 'none'
closed = True
closed_date = <Date 2013-10-12.17:11:43.666>
closer = 'rhettinger'
components = []
creation = <Date 2013-10-11.04:03:40.946>
creator = 'hongqn'
dependencies = []
files = ['32043', '32044']
hgrepos = []
issue_num = 19224
keywords = ['patch']
message_count = 10.0
messages = ['199439', '199440', '199441', '199442', '199448', '199452', '199453', '199538', '199603', '199606']
nosy_count = 6.0
nosy_names = ['tim.peters', 'rhettinger', 'pitrou', 'vstinner', 'christian.heimes', 'hongqn']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19224'
versions = ['Python 3.3', 'Python 3.4']

hongqn · 2013-10-11T04:03:40Z

Integers, strings, and bool's hash are all consistent for processes of a same interpreter. However, hash(None) differs.

$ python -c "print(hash(None))"
272931276
$ python -c "print(hash(None))"
277161420

It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None.

This patch makes hash(None) always return 0 to resolve that problem. And it is used in DPark(Python clone of Spark, a MapReduce alike framework in Python, https://github.com/douban/dpark) to speed up portable hash (see line https://github.com/douban/dpark/blob/65a3ba857f11285667c61e2e134dacda44c13a2c/dpark/util.py#L47).

davies.liu@gmail.com is the original author of this patch. All credit goes to him.

rhettinger · 2013-10-11T04:51:50Z

Instead of 0, pick some large random number that is less likely to collide with other hashes such as hash(0).

tiran · 2013-10-11T05:06:31Z

How about

>>> (78 << 24) + (111 << 16) + (110 << 8) + 101
1315925605

The output of hash() is not guaranteed to be consistent between processes. The outcome depends on the hash randomization key, architecture, platform, Python version and perhaps other flags. 32bit builds of Python generated different hash() values than 64bit. The value might depend on endianess, too. (Not sure about that)

hongqn · 2013-10-11T05:54:35Z

Return 1315925605 now :)

pitrou · 2013-10-11T09:09:36Z

Is this something we actually want to support officially? Many other types have non-repeatable hashes, e.g.:

$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8771754605115
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8791504743739
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8788491320379
$ PYTHONHASHSEED=1 python3 -c "print(hash((lambda: 0)))"
8792628055611

vstinner · 2013-10-11T09:19:16Z

In the same Python version, hash(None) always give me the same value. I cannot reproduced your issue on Linux, I tested Python 2.7, 3.3 and 3.4.

$ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026
$ python2.7 -c "print(hash(None))"
17171842026

$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465
$ python3.3 -c "print(hash(None))"
17171873465

$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812
$ python3.4 -c "print(hash(None))"
588812

vstinner · 2013-10-11T09:20:57Z

"It's wired and make difficulty for distributed systems partitioning data according hash of keys if the system wants the keys support None."

How you handle the randomization of hash(str)? (python2.7 -R, enabled by default in Python 3.3).

tim-one · 2013-10-12T04:51:34Z

-0.

Since hash(None) is currently based on None's memory address, I appreciate that it's not reliable (e.g., use different releases of the same compiler to build Python, and hash(None) may be different between them).

The docs guarantee little about hash() results, so applications relying on cross-machine - or even same-machine cross-run - consistency are broken.

It's trivial code bloat to special-case None, but it leaves a world of other hash() behaviors as-is (essentially "undefined"). The portable_hash() function in the DPark source is a start at what needs to be done if an application wants reliable hashes. But it's just a start (e.g., it's apparently relying on cross-platform consistency for hash(integer) and hash(string), etc).

Since CPython will never promise to make all of those consistent across platforms and releases, I'd rather not even start down that road. Making the promise for hash(None) would be an attractive nuisance.

rhettinger · 2013-10-12T17:11:44Z

There seems to be a pretty good consensus that this is something we don't want to support.

tiran · 2013-10-12T17:18:49Z

Tim has convinced me, too.

tiran added the type-feature A feature request or enhancement label Oct 11, 2013

rhettinger closed this as completed Oct 12, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make hash(None) consistent among processes #63423

Make hash(None) consistent among processes #63423

hongqn mannequin commented Oct 11, 2013

hongqn mannequin commented Oct 11, 2013

rhettinger commented Oct 11, 2013

tiran commented Oct 11, 2013

hongqn mannequin commented Oct 11, 2013

pitrou commented Oct 11, 2013

vstinner commented Oct 11, 2013

vstinner commented Oct 11, 2013

tim-one commented Oct 12, 2013

rhettinger commented Oct 12, 2013

tiran commented Oct 12, 2013

Make hash(None) consistent among processes #63423

Make hash(None) consistent among processes #63423

Comments

hongqn mannequin commented Oct 11, 2013

hongqn mannequin commented Oct 11, 2013

rhettinger commented Oct 11, 2013

tiran commented Oct 11, 2013

hongqn mannequin commented Oct 11, 2013

pitrou commented Oct 11, 2013

vstinner commented Oct 11, 2013

vstinner commented Oct 11, 2013

tim-one commented Oct 12, 2013

rhettinger commented Oct 12, 2013

tiran commented Oct 12, 2013