This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Replace Mersenne Twister RNG with a PCG family algorithm
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: dyedgreen, mark.dickinson, rhettinger, tim.peters
Priority: normal Keywords:

Created on 2019-11-11 17:08 by dyedgreen, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg356371 - (view) Author: Tilman Roeder (dyedgreen) Date: 2019-11-11 17:08
Currently, the `random.random()` function / the random module uses the Mersenne Twister algorithm for generating random numbers.

While this algorithm has acceptable statistical properties for most use-cases (and does feature a ridiculously large period), it is slow and very memory intensive when compared with algorithms from the PCG family (see http://www.pcg-random.org). PCG family algorithms also have much better statistical properties than the Mersenne Twister.

I would propose to replace the underlying generator in `random` with a suitable PCG family algorithm, while not changing any of the external interfaces of the module. I think that the change would make the module faster, better in terms of memory usage, and improve the statistical properties of Python's default random numbers.

I have not done anything in the direction in terms of writing any code, but if this sounds like something that would be sensible, I would be happy to work on the change and submit a PR.

Also, this is the first time I am contributing to Python/ writing an issue here, so apologies if this is not the correct place to make a suggestion like this.
msg356375 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-11-11 18:08
See discussion in #30880. That issue was closed, though it's possible that it's time to reconsider.
msg356376 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-11-11 18:10
Also worth noting that NumPy 1.17 has now adopted PCG for their default BitGenerator.
msg356377 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-11 18:23
Python is a general-purpose language, and as such I believe it's inappropriate for it to be "a leader" in adopting new PRNGs.  That's for area specialists to pioneer.

If NumPy switched, that is a good reason to evaluate this again.  But for backward compatibility we'd probably still need to support the current Twister anyway (for programs that set the seed, we try to keep results bit-for-bit reproducible across releases).

Note that Python didn't switch _to_ the Twister before it was, in effect, a de facto standard across many scientific libraries.

Something not noted in the earlier report:  TOMS rejected the PCG paper, so it will probably never be published.  I don't much care, but those who do care about peer-reviewed publication may see that as a deal killer.
msg356379 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-11 18:38
We had looked at this once before and the proposal was rejected (better to stick with the devil you know than than one that hasn't been extensively studied).

Also, there is some value to having a large state space, shuffle() and sample() consume a lot of entropy to assure that a selection from the entire population is possible.

At best, PCG would be an additional algorithm rather than a replacement -- it is already possible to subclass Random and substitute other generators (the docs link to an example for the complementary-multiply-with-carry generator).
msg356380 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-11-11 18:40
FWIW, here's the NumPy GitHub issue that led to PCG64 being chosen as the default BitGenerator. Warning: the comment thread is long (with many contributions from the PCG author).

https://github.com/numpy/numpy/issues/13635
msg356826 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-11-17 18:53
Thanks for the NumPy discussion link, Mark!  Did that set a world record for an issue report's length? ;-)  Not complaining - it's a very high quality and informative discussion.

I'd be comfortable adopting whichever PRNGs numpy uses.  numpy has better brainpower to apply to "due diligence" in this area, and the discussion made clear too that they're acutely aware of that most users know next to nothing about the pathologies, so that the defaults have to be ignorance-resistant.

It's cute that you raised good questions about how "independent" PCG streams are, and that PCG's creator invented a new member of the family to address the pathologies your line of questioning uncovered.  No "proof" that the new member is robust, but lots of testing.  That appears to be as good as the state of art allows for now.

I had/have similar concerns about the Twister, but never pursued them.  Much like PCG, in fact, it mixes a simple generator with a more-elaborate permutation of the underlying generator's output sequence (which they call "tempering").
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 82948
2019-11-17 18:53:56tim.peterssetmessages: + msg356826
2019-11-11 18:40:42mark.dickinsonsetmessages: + msg356380
2019-11-11 18:38:00rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg356379

stage: resolved
2019-11-11 18:23:31tim.peterssetnosy: + tim.peters
messages: + msg356377
2019-11-11 18:10:26mark.dickinsonsetmessages: + msg356376
2019-11-11 18:08:50mark.dickinsonsetmessages: + msg356375
2019-11-11 17:08:46dyedgreencreate