classification
Title: Random.seed, whose purpose is purportedly determinism, behaves non-deterministically with strings due to hash randomization
Type: behavior Stage: needs patch
Components: Library (Lib), Tests Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Lukasa, Nofar Schnider, glyph, python-dev, rhettinger, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2016-08-08 00:15 by glyph, last changed 2016-08-31 22:07 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
issue27706.patch Nofar Schnider, 2016-08-31 20:13 review
issue27706.patch Nofar Schnider, 2016-08-31 21:15 review
Messages (15)
msg272137 - (view) Author: Glyph Lefkowitz (glyph) (Python triager) Date: 2016-08-08 00:15
The purpose of 'seeding' a random number generator is usually to supply a deterministic sequence of values starting from a known point.  This works fine if you seed random.Random with an integer.  Often (for example, see Minecraft's map generation interface) one wants to begin with a human-memorable string as the seed, and superficially it would seem that passing a string to Random.seed would serve exactly this use-case.  In fact in its original incarnation it did.

However, since the introduction of PYTHONHASHSEED in 2.6.8, it's possible that strings now hash to different values, and on 3.2+, they'll _always_ hash to different values unless otherwise configured (which, as per the reason for introducing this feature in the first place, is a security flaw).

Right now the way to work around this is to get some deterministic hash from your string; one mechanism being a truncated SHA256 hash, for example, like this:

Random(struct.unpack("!I", sha256(seed.encode("utf-8")).digest()[:4])[0])

but this strikes me as an obscure trick to require of someone just trying to get their D&D character generator to produce the same values twice in a row for unit testing.

I'm not sure what the resolution should be, but I figured I should report this since I tripped over it.
msg272141 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-08-08 01:37
Thanks for the bug report.  This is a case of something that used to work fine but was affected by an incidental change elsewhere.   

To support the use case for deterministic sequences of values starting from a known point, the docs promise, "If a new seeding method is added, then a backward compatible seeder will be offered. The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed." (See https://docs.python.org/3/library/random.html#notes-on-reproducibility )

The resolution is to have the random module (line 327 in Modules/_randommodule.c) use a new _PyObject_Hash() function that deterministically matches what the old PyObject_Hash() function used to do.

Marking this as "needs patch" and saving it for Nofar Schnider to work on (she's an aspiring core dev).
msg272418 - (view) Author: Nofar Schnider (Nofar Schnider) * (Python triager) Date: 2016-08-11 08:12
On it!
msg272712 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-08-15 03:44
I agree with the value of a Reproducibility guarantee.  (The new section appears to have been added -- by Raymond -- in 3.2.)  For instance, people posting play-through videos using reproducible random maps typically post the 'seed' and I have seen memorable phrases rather than ints used.  I can imagine myself publishing seeds in other contexts.

In 3.2, seed gained a 2nd parameter -- 'version'.  "With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used."  For this case, hashing is not an issue. But 'conversion to int' is. Did it change with the introduction of FSR in 3.3?  It certainly should be frozen now, and the fact noted in the code.

For other non-int objects, a hashable is required.  (I expect anything other than int or string-like to be rare.) The doc does not say so (it should), but the dosctring does and experiment with [] confirms.

"With version 1, the hash() of a is used instead."

For hashed objects, whether version is 1 or 2, I guess the best we can do is to restore the fixed hash once used.

For a fixed sequence of outputs, both seed and rng have to be fixed.  2.7 still has WichmannHill for this reason.  It is gone in 3.x.  It the rng is significantly changed (different sequence for the same seed), I believe the seed version should be changed also.
msg272928 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-17 12:26
"Right now the way to work around this is to get some deterministic hash from your string; one mechanism being a truncated SHA256 hash, ..."

It looks like I missed something. Lib/random.py already computes the SHA-512 hash of you pass a string to random.Random constructor?

Using a string as a seed for random.Random already works as expected in Python 3.6:

haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 6396067846301608395
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -1771227904188177035
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 1726464324144904308
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 2069899884777593571
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -8244933646981095152
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -3269879388324739111

It was already the case in Python 2.7.
msg272972 - (view) Author: Glyph Lefkowitz (glyph) (Python triager) Date: 2016-08-17 17:29
It does seem to be stable on python 3, but on python 2.7 it's definitely a problem:

$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('9553343809', -1972659830997666042)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('5519010739', 5520208254012363023)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('7519888435', 3560222494758569319)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('9612648103', 4134882069837806740)
msg272973 - (view) Author: Glyph Lefkowitz (glyph) (Python triager) Date: 2016-08-17 17:33
Changing the affected version to just 2.7.
msg272987 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-17 21:18
> Changing the affected version to just 2.7.

Oh. The request looks like an enhancement. The problem is that if you add a new feature in Python 2.7.n+1, it's not available on Python 2.7.n. Support 2.7.n as well, you have to backport the code in your application.

I'm not sure that it's worth to add such enhancement to the random at this point in Python 2.

I suggest you to either upgrade to Python 3 (hello, Python 3!) or implement the SHA512 in your application. I expect that random.seed() in only called at one or maybe two places, so it shouldn't be hard to patch your code ;-)

In short, I suggest to close the issue as wont fix.
msg272993 - (view) Author: Glyph Lefkowitz (glyph) (Python triager) Date: 2016-08-17 21:32
For what it's worth, I don't much care whether this is fixed or not; I ended up wanting to leak less information from the RNG output anyway so I wrote this:

https://gist.github.com/glyph/ceca96100a3049fefea6f2035abbd9ea

but I felt like it should be reported.
msg274066 - (view) Author: Nofar Schnider (Nofar Schnider) * (Python triager) Date: 2016-08-31 20:13
Adding the patch with seed fix for version=1 and tests (test_random).
msg274067 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-31 20:33
Nofar Schnider: "versions: + Python 3.5, - Python 2.7"

I don't get it. This issue is specific to Python 2.7, no?

> issue27706.patch 

If you want to backport this feature from Python 3, I suggest to reuse the same code (so SHA 512). You might get the same random sequences on Python 2 and Python 3, but I don't think that it's matter :-) It's just that I expect that SHA-512 keeps more bits of entropy, than Python 2 hash function.
msg274069 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-08-31 20:44
For 2.7, we're going to add a note to the docs.  For 3.5 and 3.6, we're adjusting the version==1 logic to meet the documented guarantee.
msg274074 - (view) Author: Nofar Schnider (Nofar Schnider) * (Python triager) Date: 2016-08-31 21:15
fixed indentation
msg274076 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-08-31 21:58
New changeset 1f37903e6040 by Raymond Hettinger in branch '2.7':
Issue #27706:  Document that random.seed() is non-deterministic when PYTHONHASHSEED is enabled
https://hg.python.org/cpython/rev/1f37903e6040
msg274077 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-08-31 22:01
New changeset 5ae941fef3be by Raymond Hettinger in branch '3.5':
Issue #27706: Fix regression in random.seed(somestr, version=1)
https://hg.python.org/cpython/rev/5ae941fef3be
History
Date User Action Args
2016-08-31 22:07:06rhettingersetstatus: open -> closed
resolution: fixed
2016-08-31 22:01:42python-devsetmessages: + msg274077
2016-08-31 21:58:02python-devsetnosy: + python-dev
messages: + msg274076
2016-08-31 21:15:41Nofar Schnidersetfiles: + issue27706.patch

messages: + msg274074
versions: - Python 2.7
2016-08-31 20:44:54rhettingersetmessages: + msg274069
versions: + Python 2.7, Python 3.6
2016-08-31 20:33:23vstinnersetmessages: + msg274067
2016-08-31 20:13:45Nofar Schnidersetfiles: + issue27706.patch
versions: + Python 3.5, - Python 2.7
messages: + msg274066

components: + Tests
keywords: + patch
2016-08-17 21:32:38glyphsetmessages: + msg272993
2016-08-17 21:18:45vstinnersetmessages: + msg272987
2016-08-17 17:33:29glyphsetmessages: + msg272973
versions: - Python 3.5, Python 3.6
2016-08-17 17:29:55glyphsetmessages: + msg272972
2016-08-17 12:26:54vstinnersetnosy: + vstinner
messages: + msg272928
2016-08-15 03:44:16terry.reedysetnosy: + terry.reedy
messages: + msg272712
2016-08-11 08:12:06Nofar Schnidersetnosy: + Nofar Schnider
messages: + msg272418
2016-08-08 07:16:45Lukasasetnosy: + Lukasa
2016-08-08 01:37:51rhettingersetstage: needs patch
messages: + msg272141
versions: + Python 2.7, Python 3.5, Python 3.6
2016-08-08 00:27:00rhettingersetassignee: rhettinger

nosy: + rhettinger
2016-08-08 00:15:14glyphcreate