Title: random.getrandbits is limited to 2**31-1 bits on 64-bit Windows
Type: enhancement Stage: patch review
Components: Extension Modules, Windows Versions: Python 3.6
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Steven.Barker, ideasman42, mark.dickinson, paul.moore, rhettinger, serhiy.storchaka, steve.dower, tim.golden, tim.peters, zach.ware
Priority: normal Keywords: patch

Created on 2016-05-21 06:07 by Steven.Barker, last changed 2016-06-06 00:17 by rhettinger. This issue is now closed.

File name Uploaded Description Edit
getrandbits.diff Steven.Barker, 2016-05-21 06:07 review
Messages (7)
msg265987 - (view) Author: Steven Barker (Steven.Barker) * Date: 2016-05-21 06:07
The C implementation of `_random.Random.getrandbits` is unnecessarily limited in the number of bits it can produce on 64-bit Windows systems. I learned about this issue in discussion of my answer to this stack overflow question:

The argument parsing code in `getrandbits` forces its Python `int` argument to fit into a C `int` variable. On my 64-bit Windows system, any value larger than `2**31-1` causes a `OverflowError`.

Since the number of bits is directly related to how much memory we need to allocate (in the non-fast case), I think `Py_ssize_t` would be more appropriate type than a regular `int`. This probably isn't an issue on non-Windows or 64-bit systems, where `int` and `Py_ssize_t` will have the same size.

I'm attaching a very simple patch that changes the types of the relevant variables and the format code in the call to `PyArg_ParseTuple`. The code works and still passes its tests with the patch. I considered adding an additional test for this issue, but passing test cases would require allocations of several gigabytes of memory which seems a rather unfriendly thing to add in a test for a fairly minor issue.

This issue doesn't effect the pure Python implementation of `random.SystemRandom.getrandbits`, which already worked fine when large numbers of bits were requested. The documentation for `random.getrandbits` doesn't mention any limitation on the number of bits provided, so I don't imagine there will be backwards compatibility issues. I also don't expect the change to have any impact on third party `Random` replacement classes.

For convenience, here's the contents of the very short patch (which I'll also attach):

diff --git a/Modules/_randommodule.c b/Modules/_randommodule.c
index fd6b230..3bf564f 100644
--- a/Modules/_randommodule.c
+++ b/Modules/_randommodule.c
@@ -348,12 +348,12 @@ random_setstate(RandomObject *self, PyObject *state)
 static PyObject *
 random_getrandbits(RandomObject *self, PyObject *args)
-    int k, i, words;
+    Py_ssize_t k, i, words;
     PY_UINT32_T r;
     PY_UINT32_T *wordarray;
     PyObject *result;

-    if (!PyArg_ParseTuple(args, "i:getrandbits", &k))
+    if (!PyArg_ParseTuple(args, "n:getrandbits", &k))
         return NULL;

     if (k <= 0) {
msg266034 - (view) Author: Campbell Barton (ideasman42) * Date: 2016-05-21 23:30
> This probably isn't an issue on non-Windows or 64-bit systems.

In fact it is, the limitation applies to 64bit Linux too. (tested in CPython 3.5.1)
msg266036 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-05-22 01:21
Since the downstream calls to PyMem_Malloc and _PyLong_FromByteArray both accept size_t for their sizing, there isn't a problem there.

That said, I think the current limitation nicely protects us from harm.  If you were to run getrandbits(2**60) it would take a long time, eat all your memory, trigger swaps until your harddrive was full, and you wouldn't be able to break out of the tight loop with a keyboard interrupt.

Even with the current limit, the resultant int object is ridiculously big in a way that is awkward to manipulate after it is created (don't bother trying to print it, jsonify it, or doing any interesting math it).

Also, if a person wants a lot of bits, it is effortless to make repeated calls getrandbits() using the current API.  Doing so would likely improve their code and be a better design (consuming bits as generated rather than creating them all at once and extracting them later).

In short, just because we can do it, doesn't mean we should.
msg266039 - (view) Author: Campbell Barton (ideasman42) * Date: 2016-05-22 01:40
@rhettinger, agree that very large ints in this case aren't going to give very usable results.

On the other hand, this limit isn't imposed elsewhere (you can power-of operator to create bigger numbers).

Nevertheless this isn't going to give good/usable performance.


Might having a method added to `randome.Random` that returns random bits (as os.urandom does), be something Python project would consider accepting?
msg266044 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-05-22 02:12
> On the other hand, this limit isn't imposed elsewhere.

There are a number of places in the language with these limits.  In general, we're opening them up to wider limits if there are valid use cases and if it doesn't immediately shoot you in the foot like it does here.

Practicality is a reasonable design consideration.  Mitigation of harm is also reasonable design consideration.  Foolish consistently is, well, you know how the saying goes :-)

> Might having a method added to `randome.Random` that returns 
> random bits (as os.urandom does), be something Python project
> would consider accepting?

It isn't needed.  As far as I know, there has never been a user request for this functionality nor a presentation of use cases that benefit it.  The API is already bloated and we already have one way to do it with int.to_bytes().  Also, we should keep this tracker entry focused on the OP's report and not meander.
msg266054 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-22 06:29
The only case I know that would benefit is generating random data for tests. On my computer generating 2*28 bits with getrandbits() takes 2 sec (including 1 sec for converting from bytes to int), plus 1.4 sec for converting from int to bytes. Special API would speed up generating random bytes by 3.4 times. I would request for this functionality, but I don't want to clutter the API.

As for getrandbits.diff, I think that we should follow the principle "we're all consenting adults here".
msg267492 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-06 00:17
Sorry Steven, I'm going to mark this as rejected on the grounds that it is likely to do more harm than good.  We could in fact make the range larger but it easily creates terrible effects (encouraging bad design and creating a non-interruptable, long-running, total-memory-filling call).  

While we do allow ``2 ** 50 ** 50``, that call is more deliberately asking for trouble than getrandbits(2**60).  If someone really needed that number of bits, it isn't hard to multiple calls to getrandbits() and combine the results, deliberately and interruptably.
Date User Action Args
2016-06-06 00:17:45rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg267492
2016-05-22 06:29:18serhiy.storchakasetmessages: + msg266054
2016-05-22 02:52:47rhettingersetnosy: + tim.peters
2016-05-22 02:12:16rhettingersetmessages: + msg266044
2016-05-22 01:40:46ideasman42setmessages: + msg266039
2016-05-22 01:21:09rhettingersetmessages: + msg266036
2016-05-22 00:40:53rhettingersetnosy: + serhiy.storchaka
2016-05-22 00:37:34rhettingersetassignee: rhettinger
2016-05-21 23:30:01ideasman42setnosy: + ideasman42
messages: + msg266034
2016-05-21 06:23:08SilentGhostsetnosy: + rhettinger, paul.moore, mark.dickinson, tim.golden, zach.ware, steve.dower

components: + Extension Modules, Windows, - Library (Lib)
stage: patch review
2016-05-21 06:07:50Steven.Barkercreate