classification
Title: random.Random.seed() with version=1 does not consistently match Python 2 behavior
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, mrled, rhettinger, steven.daprano
Priority: normal Keywords:

Created on 2020-05-19 14:32 by mrled, last changed 2020-05-20 00:28 by rhettinger. This issue is now closed.

Messages (3)
msg369356 - (view) Author: Micah R Ledbetter (mrled) Date: 2020-05-19 14:32
When using the random.Random class, using the .seed() method with version=1 does not always reproduce the same results as the .seed() method did in Python 2.

From the docs, I did expect this, but on closer inspection, I can't tell whether I made a bad assumption or whether there is a bug in the module.

The docs state an intention of compatibility with older versions of Python:
https://docs.python.org/3.9/library/random.html#notes-on-reproducibility

> Most of the random module’s algorithms and seeding functions are subject to change across Python versions, but two aspects are guaranteed not to change:
>
> If a new seeding method is added, then a backward compatible seeder will be offered.
>
> The generator’s random() method will continue to produce the same sequence when the compatible seeder is given the same seed.

It's not clear from the docstring in the code whether this is intended to cover Python 2.7 behavior:
https://github.com/python/cpython/blob/3.9/Lib/random.py#L134

> For version 2 (the default), all of the bits are used if *a* is a str,
> bytes, or bytearray.  For version 1 (provided for reproducing random
> sequences from older versions of Python), the algorithm for str and
> bytes generates a narrower range of seeds.

But the results I've spot checked sometimes do match the Python 2 results, and sometimes are the Python 2 result +1.



I wrote a python script that calls the .seed() method with version=1 under Python 3, and without a version= argument under Python 2. It uses a wordlist I happen to have in /usr/share/dict that I copied to $PWD.

#!/usr/bin/env python
import os, random, sys
mydir = os.path.dirname(os.path.abspath(__file__))
r = random.Random()
maxidx = None
with open('{}/web2'.format(mydir)) as webdict:
  for idx, raw_word in enumerate(webdict.readlines()):
    word = raw_word.strip()
    if sys.version_info[0] == 2:
      r.seed(word)
    elif sys.version_info[0] == 3:
      r.seed(word, version=1)
    else:
      raise Exception("Unexpected python version")
    print("{}: {}".format(word, r.randrange(0, 65535, 1)))
    if maxidx != None and idx >= maxidx:
      break



I also wrote a shell script to run my Python script with the Python versions I happen to have installed locally, along with Python 2.7 and 3.4-3.9 in the ci-image Docker container linked from the Python download page.

#!/bin/sh
set -eux
mkdir -p results
/usr/bin/python test.py > results/macos10.15.4.system.python2.7.16
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3 test.py > results/macos10.15.4.system.python3.8.2
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python3.9 /testpy/test.py > /testpy/results/ci-image.python3.9'
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python3.8 /testpy/test.py > /testpy/results/ci-image.python3.8'
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python3.7 /testpy/test.py > /testpy/results/ci-image.python3.7'
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python3.6 /testpy/test.py > /testpy/results/ci-image.python3.6'
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python3.5 /testpy/test.py > /testpy/results/ci-image.python3.5'
docker run -v $PWD:/testpy:rw -u root -it --rm quay.io/python-devs/ci-image sh -c 'python2.7 /testpy/test.py > /testpy/results/ci-image.python2.7'



I've made a github repo that contains both scripts and the results:
https://github.com/mrled/random-Random-seed-version-testing

I ran the script on my Mac, which means I used the system installed Python binaries that came with macOS x86_64, but the ci-image Python versions are running under an x86_64 Linux virtual machine (because of how Docker for Mac works).
 
To summarize the results:

* The Python 2.7 on my Mac works the same as the Python 2.7 on the ci-image
* The Python 3.8 on my Mac works the same as Pythons 3.5-3.9 on the ci-image
* Python 3.4 is different from both (although it is now unsupported anyway)

A sample of the results. I haven't programmatically analyzed them, but from my spot checks, they all appear to be like this:

> head results.ci-image.python2.7  |  > head results.ci-image.python3.9
A: 8866                            |  A: 8867
a: 56458                           |  a: 56459
aa: 29724                          |  aa: 29724
aal: 11248                         |  aal: 11248
aalii: 16623                       |  aalii: 16623
aam: 62302                         |  aam: 62303
Aani: 31381                        |  Aani: 31381
aardvark: 6397                     |  aardvark: 6397
aardwolf: 32525                    |  aardwolf: 32526
Aaron: 32019                       |  Aaron: 32019
msg369394 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-05-19 21:15
3.5 and 3.6 are now only accepting security fixes.

Only the stability of random.random is guaranteed across versions, but you are calling randrange:

https://docs.python.org/3/library/random.html#notes-on-reproducibility

So I am pretty sure that this will not be considered a bug (unless it is a design bug).

Personally I think that the lack of reproducibility of the full range of random methods is a rather large annoyance: if you care about reproducibility, including doctests, you cannot use anything in the module except random.random, but have to write your own implementation (possibly by copying and pasting).

I don't have a good solution for this though.
msg369409 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-20 00:28
The parts that are supposed to be stable are the seeding and the output of calls to random().  The sessions shown below show that this working as intended.

The downstream algorithms such as randrange() are not protected by the reproducibility guarantees.  While we try not to change them unnecessarily, they are allowed to change and to generate different sequences.  

At some point in Python 3's history, we changed randrange() so that it often gives different results than before.  The reason for the change is that the old algorithm wasn't as evenly distributed as it should have been.

------ Sessions showing that the output of random() is stable ------

Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 16:24:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license()" for more information.
>>> import random
>>> random.seed('superman123')
>>> [random.random() for i in range(5)]
[0.6740635277890739, 0.3455289115553195, 0.6883176146073614, 0.3824266890084288, 0.9839811707434662]

Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license()" for more information.
>>> import random
>>> random.seed('superman123', version=1)
>>> [random.random() for i in range(5)]
[0.6740635277890739, 0.3455289115553195, 0.6883176146073614, 0.3824266890084288, 0.9839811707434662]
History
Date User Action Args
2020-05-20 00:28:33rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg369409

stage: resolved
2020-05-19 21:15:04steven.dapranosetnosy: + steven.daprano

messages: + msg369394
versions: - Python 3.5, Python 3.6
2020-05-19 14:42:28serhiy.storchakasetnosy: + rhettinger, mark.dickinson
2020-05-19 14:32:46mrledcreate