msg272137 - (view) |
Author: Glyph Lefkowitz (glyph) |
Date: 2016-08-08 00:15 |
The purpose of 'seeding' a random number generator is usually to supply a deterministic sequence of values starting from a known point. This works fine if you seed random.Random with an integer. Often (for example, see Minecraft's map generation interface) one wants to begin with a human-memorable string as the seed, and superficially it would seem that passing a string to Random.seed would serve exactly this use-case. In fact in its original incarnation it did.
However, since the introduction of PYTHONHASHSEED in 2.6.8, it's possible that strings now hash to different values, and on 3.2+, they'll _always_ hash to different values unless otherwise configured (which, as per the reason for introducing this feature in the first place, is a security flaw).
Right now the way to work around this is to get some deterministic hash from your string; one mechanism being a truncated SHA256 hash, for example, like this:
Random(struct.unpack("!I", sha256(seed.encode("utf-8")).digest()[:4])[0])
but this strikes me as an obscure trick to require of someone just trying to get their D&D character generator to produce the same values twice in a row for unit testing.
I'm not sure what the resolution should be, but I figured I should report this since I tripped over it.
|
msg272141 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-08-08 01:37 |
Thanks for the bug report. This is a case of something that used to work fine but was affected by an incidental change elsewhere.
To support the use case for deterministic sequences of values starting from a known point, the docs promise, "If a new seeding method is added, then a backward compatible seeder will be offered. The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed." (See https://docs.python.org/3/library/random.html#notes-on-reproducibility )
The resolution is to have the random module (line 327 in Modules/_randommodule.c) use a new _PyObject_Hash() function that deterministically matches what the old PyObject_Hash() function used to do.
Marking this as "needs patch" and saving it for Nofar Schnider to work on (she's an aspiring core dev).
|
msg272418 - (view) |
Author: Nofar Schnider (Nofar Schnider) * |
Date: 2016-08-11 08:12 |
On it!
|
msg272712 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2016-08-15 03:44 |
I agree with the value of a Reproducibility guarantee. (The new section appears to have been added -- by Raymond -- in 3.2.) For instance, people posting play-through videos using reproducible random maps typically post the 'seed' and I have seen memorable phrases rather than ints used. I can imagine myself publishing seeds in other contexts.
In 3.2, seed gained a 2nd parameter -- 'version'. "With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used." For this case, hashing is not an issue. But 'conversion to int' is. Did it change with the introduction of FSR in 3.3? It certainly should be frozen now, and the fact noted in the code.
For other non-int objects, a hashable is required. (I expect anything other than int or string-like to be rare.) The doc does not say so (it should), but the dosctring does and experiment with [] confirms.
"With version 1, the hash() of a is used instead."
For hashed objects, whether version is 1 or 2, I guess the best we can do is to restore the fixed hash once used.
For a fixed sequence of outputs, both seed and rng have to be fixed. 2.7 still has WichmannHill for this reason. It is gone in 3.x. It the rng is significantly changed (different sequence for the same seed), I believe the seed version should be changed also.
|
msg272928 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-08-17 12:26 |
"Right now the way to work around this is to get some deterministic hash from your string; one mechanism being a truncated SHA256 hash, ..."
It looks like I missed something. Lib/random.py already computes the SHA-512 hash of you pass a string to random.Random constructor?
Using a string as a seed for random.Random already works as expected in Python 3.6:
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 6396067846301608395
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -1771227904188177035
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 1726464324144904308
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 2069899884777593571
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -8244933646981095152
haypo@selma$ python3 -c "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
8755240310 -3269879388324739111
It was already the case in Python 2.7.
|
msg272972 - (view) |
Author: Glyph Lefkowitz (glyph) |
Date: 2016-08-17 17:29 |
It does seem to be stable on python 3, but on python 2.7 it's definitely a problem:
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('9553343809', -1972659830997666042)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('5519010739', 5520208254012363023)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('7519888435', 3560222494758569319)
$ python -Rc "import random; r=random.Random('abc'); print(''.join(map(str, (r.randrange(10) for x in range(10)))), hash('abc'))"
('9612648103', 4134882069837806740)
|
msg272973 - (view) |
Author: Glyph Lefkowitz (glyph) |
Date: 2016-08-17 17:33 |
Changing the affected version to just 2.7.
|
msg272987 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-08-17 21:18 |
> Changing the affected version to just 2.7.
Oh. The request looks like an enhancement. The problem is that if you add a new feature in Python 2.7.n+1, it's not available on Python 2.7.n. Support 2.7.n as well, you have to backport the code in your application.
I'm not sure that it's worth to add such enhancement to the random at this point in Python 2.
I suggest you to either upgrade to Python 3 (hello, Python 3!) or implement the SHA512 in your application. I expect that random.seed() in only called at one or maybe two places, so it shouldn't be hard to patch your code ;-)
In short, I suggest to close the issue as wont fix.
|
msg272993 - (view) |
Author: Glyph Lefkowitz (glyph) |
Date: 2016-08-17 21:32 |
For what it's worth, I don't much care whether this is fixed or not; I ended up wanting to leak less information from the RNG output anyway so I wrote this:
https://gist.github.com/glyph/ceca96100a3049fefea6f2035abbd9ea
but I felt like it should be reported.
|
msg274066 - (view) |
Author: Nofar Schnider (Nofar Schnider) * |
Date: 2016-08-31 20:13 |
Adding the patch with seed fix for version=1 and tests (test_random).
|
msg274067 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2016-08-31 20:33 |
Nofar Schnider: "versions: + Python 3.5, - Python 2.7"
I don't get it. This issue is specific to Python 2.7, no?
> issue27706.patch
If you want to backport this feature from Python 3, I suggest to reuse the same code (so SHA 512). You might get the same random sequences on Python 2 and Python 3, but I don't think that it's matter :-) It's just that I expect that SHA-512 keeps more bits of entropy, than Python 2 hash function.
|
msg274069 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-08-31 20:44 |
For 2.7, we're going to add a note to the docs. For 3.5 and 3.6, we're adjusting the version==1 logic to meet the documented guarantee.
|
msg274074 - (view) |
Author: Nofar Schnider (Nofar Schnider) * |
Date: 2016-08-31 21:15 |
fixed indentation
|
msg274076 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-08-31 21:58 |
New changeset 1f37903e6040 by Raymond Hettinger in branch '2.7':
Issue #27706: Document that random.seed() is non-deterministic when PYTHONHASHSEED is enabled
https://hg.python.org/cpython/rev/1f37903e6040
|
msg274077 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-08-31 22:01 |
New changeset 5ae941fef3be by Raymond Hettinger in branch '3.5':
Issue #27706: Fix regression in random.seed(somestr, version=1)
https://hg.python.org/cpython/rev/5ae941fef3be
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:34 | admin | set | github: 71893 |
2016-08-31 22:07:06 | rhettinger | set | status: open -> closed resolution: fixed |
2016-08-31 22:01:42 | python-dev | set | messages:
+ msg274077 |
2016-08-31 21:58:02 | python-dev | set | nosy:
+ python-dev messages:
+ msg274076
|
2016-08-31 21:15:41 | Nofar Schnider | set | files:
+ issue27706.patch
messages:
+ msg274074 versions:
- Python 2.7 |
2016-08-31 20:44:54 | rhettinger | set | messages:
+ msg274069 versions:
+ Python 2.7, Python 3.6 |
2016-08-31 20:33:23 | vstinner | set | messages:
+ msg274067 |
2016-08-31 20:13:45 | Nofar Schnider | set | files:
+ issue27706.patch versions:
+ Python 3.5, - Python 2.7 messages:
+ msg274066
components:
+ Tests keywords:
+ patch |
2016-08-17 21:32:38 | glyph | set | messages:
+ msg272993 |
2016-08-17 21:18:45 | vstinner | set | messages:
+ msg272987 |
2016-08-17 17:33:29 | glyph | set | messages:
+ msg272973 versions:
- Python 3.5, Python 3.6 |
2016-08-17 17:29:55 | glyph | set | messages:
+ msg272972 |
2016-08-17 12:26:54 | vstinner | set | nosy:
+ vstinner messages:
+ msg272928
|
2016-08-15 03:44:16 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg272712
|
2016-08-11 08:12:06 | Nofar Schnider | set | nosy:
+ Nofar Schnider messages:
+ msg272418
|
2016-08-08 07:16:45 | Lukasa | set | nosy:
+ Lukasa
|
2016-08-08 01:37:51 | rhettinger | set | stage: needs patch messages:
+ msg272141 versions:
+ Python 2.7, Python 3.5, Python 3.6 |
2016-08-08 00:27:00 | rhettinger | set | assignee: rhettinger
nosy:
+ rhettinger |
2016-08-08 00:15:14 | glyph | create | |