classification
Title: Pure Python implementation of random
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Arfrever, alex, brett.cannon, mark.dickinson, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-12-10 22:21 by serhiy.storchaka, last changed 2013-04-16 14:31 by alex. This issue is now closed.

Files
File name Uploaded Description Edit
random_pure_python_3.patch serhiy.storchaka, 2012-12-11 20:27 review
random_pure_python_5.patch serhiy.storchaka, 2012-12-13 19:16 review
Messages (16)
msg177315 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-10 22:21
C implemented part of random module does not depend on any specific C level abilities. Here is a patch which implements this part on pure Python. May be this is not a best implementation. And I don't know how write tests for both implementations.
msg177326 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-12-11 08:06
I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code.  (Not sure whether this would have a noticeable effect on performance.)
msg177329 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-11 10:07
Patch updated. Tests now test both implementation. Thank Ezio for tip.
msg177330 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-11 10:42
> I wonder whether it would make sense to use an array to hold the MT state, for a closer match with the C code.

I don't think it makes sense. The algorithm is same and list is more natural for Python. Also arrays a little slower than lists, but it doesn't matter, because Python implementation of random slower C implementation anyway.
msg177337 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-12-11 14:11
To see how to write tests that exercise both the C and Python versions look at test_heapq and test_warnings as examples. It will require some refactoring, but it's easy to do.
msg177342 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-11 16:52
Yes, I used test_heapq and test_decimal as an example.
msg177386 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-12 19:46
Patch updated. Comments and docstrings a little enhanced. Thanks Brett for review.

Also Python implementation of core generator now is threadsafe (in particular random() and getrandbits() methods) as its C implementation and its private members now are more hidden (to prevent unintentional conflicts). See differences between patches for details.
msg177432 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-13 19:16
Patch updated. One bug fixed.

Also I made some benchmarks. The pure Python random() about 100 times slower and getrandbits(32) about 13 times slower than the C implementation.
msg178645 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-12-31 07:02
Brett, if all the other Python implementations already have a working implementation of random, what purpose is served by adding a pure python version of the Mersenne Twister than won't be run by anyone?
msg178655 - (view) Author: Stefan Behnel (scoder) * Date: 2012-12-31 09:13
FWIW, PyPy has an (R)Python implementation already:

https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rrandom.py
msg178676 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-12-31 14:11
In response to Raymond:

First, Serhiy is a core developer now, so if he wants to commit this code and maintain it I have no objections as it doesn't detract from anything and the maintenance burden is his if he wants it. Whether any of us view it as the best use of his time or not is not our call and we can't stop him. =)

Second, while PyPy may have an RPython implementation, it's originally from 2006, has already been patched by them twice in 2011 for bugs, and may not be needed by them anymore based on current performance characteristics of PyPy today in lieu of this code (and that's assuming they wrote the code in RPython originally for a specific reason compared to just needing something that worked, but this is all a guess w/o actually benchmarking).

Third, I can't predict future VMs and their needs. It might not be used by a VM today (unless PyPy starts using it for their py3k work), but who knows what the future holds? As I said, Serhiy already wrote the code and is the core dev who will maintain it if it goes in so I don't see a maintenance burden here that is being hoisted upon python-dev.

Fourth, I added a comment to issue #16651 to state that people should see what the other VMs already have and to start a conversation first before moving forward with a Python port to make sure no one views it as a waste of time.
msg178684 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-31 14:53
I don't want to make a decision on the inclusion of this code. However, I will undertake to maintain it. I'm going to fix one algorithmic bug in current implementation and add C implementations for some methods which significantly slowed in Python implementation. This can be done without the committing of this patch, but the dual test the two implementations will make the code more reliable. Even if Python implementation is not to be used, it will help in maintaining C implementation.
msg178722 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-01 00:37
Okay thanks.  I'll look over the patch in detail over the next few days.  

FYI, I'm the maintainer of this module (and was the one to add the original Mersenne Twister code).
msg183916 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-03-11 02:53
[Serhiy Storchaka]
> I don't want to make a decision on the inclusion of this code.

It was a tough call, but I don't want to add this code.  You've a great job with it, but I don't think it is a worthwhile endeavor.  

* The algorithm was designed with C level operations in mind and is awkward in Python. 

* The code we use for MT has been widely distributed and tested by others.  We would just be getting away from the canonical reference version.

* The other implementations already have an MT, so this code won't end-up being used and would just add to the maintenance burden.  If anyone ever did use it, it would be dog slow.

-----------

P.S. I did have a question about the patch.  

The code uses an RLock.  Where are the places that can trigger reentracy?

With in a single method, successive calls to _genrand_int32() are ordered and can't be interleaved with reentrancy while still keeping the original order of generated random numbers.
msg187084 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-16 14:26
I was talking with Alex Gaynor about the Python implementation of operator (http://bugs.python.org/issue16694) and asked about this bug since Raymond said the fact PyPy had an RPython implementation was a knock against bothering with this. Alex said if performance could be shown to be good then PyPy would be willing to consider dropping their accelerated version and switch to this (http://bugs.python.org/issue16694). So if someone is so motivated, doing a benchmark showing whether PyPy's accelerated version (which is relatively new so you would need to probably grab 2.0 or maybe 1.9) is faster/same/slower than this pure Python version would be nice to have.
msg187085 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2013-04-16 14:31
Looking at the patch (haven't actually benchmarked it), I have two concerns with respect to performance:

a) The need for locking, this doesn't exist in the C or RPython versions because of the GIL. That locking is going to be distinctly un-free.
b) The need for manually masking overflowing arithmetic (yes I know, everything is a long, but just looking at it algorithmically, we really want the 2s complement).

I don't have an opinion about how to solve either of these, but without a solution I doubt performance will ever be competitive. I think it would be a mistake to assume these issues are specific to this patch, they strike me as generally applicable issues.
History
Date User Action Args
2013-04-16 14:31:41alexsetmessages: + msg187085
2013-04-16 14:26:47brett.cannonsetnosy: + alex
messages: + msg187084
2013-03-11 02:53:35rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg183916
2013-01-01 00:37:02rhettingersetassignee: serhiy.storchaka -> rhettinger
messages: + msg178722
2012-12-31 14:53:22serhiy.storchakasetmessages: + msg178684
2012-12-31 14:11:45brett.cannonsetmessages: + msg178676
2012-12-31 09:13:08scodersetnosy: + scoder
messages: + msg178655
2012-12-31 07:02:01rhettingersetmessages: + msg178645
2012-12-29 21:59:34serhiy.storchakasetassignee: serhiy.storchaka
2012-12-14 08:26:41Arfreversetnosy: + Arfrever
2012-12-13 19:16:26serhiy.storchakasetfiles: + random_pure_python_5.patch

messages: + msg177432
2012-12-13 19:12:19serhiy.storchakasetfiles: - random_pure_python_4.patch
2012-12-12 19:46:35serhiy.storchakasetfiles: + random_pure_python_4.patch

messages: + msg177386
2012-12-11 20:28:59serhiy.storchakasetfiles: - random_pure_python_2.patch
2012-12-11 20:27:24serhiy.storchakasetfiles: + random_pure_python_3.patch
2012-12-11 16:53:06serhiy.storchakasetfiles: - random_pure_python.patch
2012-12-11 16:52:44serhiy.storchakasetmessages: + msg177342
2012-12-11 14:11:01brett.cannonsetmessages: + msg177337
2012-12-11 10:42:51serhiy.storchakasetmessages: + msg177330
stage: needs patch -> patch review
2012-12-11 10:42:16serhiy.storchakalinkissue16651 dependencies
2012-12-11 10:07:47serhiy.storchakasetfiles: + random_pure_python_2.patch

messages: + msg177329
2012-12-11 08:06:14mark.dickinsonsetmessages: + msg177326
2012-12-11 07:39:25mark.dickinsonsetnosy: + mark.dickinson
2012-12-10 22:21:06serhiy.storchakacreate