Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random.getrandbits is limited to 2**31-1 bits on 64-bit Windows #71259

Closed
BlckKnght mannequin opened this issue May 21, 2016 · 7 comments
Closed

random.getrandbits is limited to 2**31-1 bits on 64-bit Windows #71259

BlckKnght mannequin opened this issue May 21, 2016 · 7 comments
Assignees
Labels
extension-modules C modules in the Modules dir OS-windows type-feature A feature request or enhancement

Comments

@BlckKnght
Copy link
Mannequin

BlckKnght mannequin commented May 21, 2016

BPO 27072
Nosy @tim-one, @rhettinger, @pfmoore, @mdickinson, @tjguk, @ideasman42, @zware, @serhiy-storchaka, @zooba, @BlckKnght
Files
  • getrandbits.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2016-06-06.00:17:45.520>
    created_at = <Date 2016-05-21.06:07:50.081>
    labels = ['extension-modules', 'type-feature', 'OS-windows']
    title = 'random.getrandbits is limited to 2**31-1 bits on 64-bit Windows'
    updated_at = <Date 2016-06-06.00:17:45.519>
    user = 'https://github.com/BlckKnght'

    bugs.python.org fields:

    activity = <Date 2016-06-06.00:17:45.519>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2016-06-06.00:17:45.520>
    closer = 'rhettinger'
    components = ['Extension Modules', 'Windows']
    creation = <Date 2016-05-21.06:07:50.081>
    creator = 'Steven.Barker'
    dependencies = []
    files = ['42919']
    hgrepos = []
    issue_num = 27072
    keywords = ['patch']
    message_count = 7.0
    messages = ['265987', '266034', '266036', '266039', '266044', '266054', '267492']
    nosy_count = 10.0
    nosy_names = ['tim.peters', 'rhettinger', 'paul.moore', 'mark.dickinson', 'tim.golden', 'ideasman42', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'Steven.Barker']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue27072'
    versions = ['Python 3.6']

    @BlckKnght
    Copy link
    Mannequin Author

    BlckKnght mannequin commented May 21, 2016

    The C implementation of _random.Random.getrandbits is unnecessarily limited in the number of bits it can produce on 64-bit Windows systems. I learned about this issue in discussion of my answer to this stack overflow question:

    http://stackoverflow.com/questions/37356338/is-there-a-predictable-replacement-for-os-urandom-using-pythons-random-module

    The argument parsing code in getrandbits forces its Python int argument to fit into a C int variable. On my 64-bit Windows system, any value larger than 2**31-1 causes a OverflowError.

    Since the number of bits is directly related to how much memory we need to allocate (in the non-fast case), I think Py_ssize_t would be more appropriate type than a regular int. This probably isn't an issue on non-Windows or 64-bit systems, where int and Py_ssize_t will have the same size.

    I'm attaching a very simple patch that changes the types of the relevant variables and the format code in the call to PyArg_ParseTuple. The code works and still passes its tests with the patch. I considered adding an additional test for this issue, but passing test cases would require allocations of several gigabytes of memory which seems a rather unfriendly thing to add in a test for a fairly minor issue.

    This issue doesn't effect the pure Python implementation of random.SystemRandom.getrandbits, which already worked fine when large numbers of bits were requested. The documentation for random.getrandbits doesn't mention any limitation on the number of bits provided, so I don't imagine there will be backwards compatibility issues. I also don't expect the change to have any impact on third party Random replacement classes.

    For convenience, here's the contents of the very short patch (which I'll also attach):

    diff --git a/Modules/_randommodule.c b/Modules/_randommodule.c
    index fd6b230..3bf564f 100644
    --- a/Modules/_randommodule.c
    +++ b/Modules/_randommodule.c
    @@ -348,12 +348,12 @@ random_setstate(RandomObject *self, PyObject *state)
     static PyObject *
     random_getrandbits(RandomObject *self, PyObject *args)
     {
    -    int k, i, words;
    +    Py_ssize_t k, i, words;
         PY_UINT32_T r;
         PY_UINT32_T *wordarray;
         PyObject *result;
    • if (!PyArg_ParseTuple(args, "i:getrandbits", &k))
      + if (!PyArg_ParseTuple(args, "n:getrandbits", &k))
      return NULL;
         if (k <= 0) {

    @BlckKnght BlckKnght mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels May 21, 2016
    @SilentGhost SilentGhost mannequin added extension-modules C modules in the Modules dir OS-windows and removed stdlib Python modules in the Lib dir labels May 21, 2016
    @ideasman42
    Copy link
    Mannequin

    ideasman42 mannequin commented May 21, 2016

    This probably isn't an issue on non-Windows or 64-bit systems.

    In fact it is, the limitation applies to 64bit Linux too. (tested in CPython 3.5.1)

    @rhettinger rhettinger self-assigned this May 22, 2016
    @rhettinger
    Copy link
    Contributor

    Since the downstream calls to PyMem_Malloc and _PyLong_FromByteArray both accept size_t for their sizing, there isn't a problem there.

    That said, I think the current limitation nicely protects us from harm. If you were to run getrandbits(2**60) it would take a long time, eat all your memory, trigger swaps until your harddrive was full, and you wouldn't be able to break out of the tight loop with a keyboard interrupt.

    Even with the current limit, the resultant int object is ridiculously big in a way that is awkward to manipulate after it is created (don't bother trying to print it, jsonify it, or doing any interesting math it).

    Also, if a person wants a lot of bits, it is effortless to make repeated calls getrandbits() using the current API. Doing so would likely improve their code and be a better design (consuming bits as generated rather than creating them all at once and extracting them later).

    In short, just because we can do it, doesn't mean we should.

    @ideasman42
    Copy link
    Mannequin

    ideasman42 mannequin commented May 22, 2016

    @rhettinger, agree that very large ints in this case aren't going to give very usable results.

    On the other hand, this limit isn't imposed elsewhere (you can power-of operator to create bigger numbers).

    Nevertheless this isn't going to give good/usable performance.

    ----

    Might having a method added to randome.Random that returns random bits (as os.urandom does), be something Python project would consider accepting?

    @rhettinger
    Copy link
    Contributor

    On the other hand, this limit isn't imposed elsewhere.

    There are a number of places in the language with these limits. In general, we're opening them up to wider limits if there are valid use cases and if it doesn't immediately shoot you in the foot like it does here.

    Practicality is a reasonable design consideration. Mitigation of harm is also reasonable design consideration. Foolish consistently is, well, you know how the saying goes :-)

    Might having a method added to randome.Random that returns
    random bits (as os.urandom does), be something Python project
    would consider accepting?

    It isn't needed. As far as I know, there has never been a user request for this functionality nor a presentation of use cases that benefit it. The API is already bloated and we already have one way to do it with int.to_bytes(). Also, we should keep this tracker entry focused on the OP's report and not meander.

    @serhiy-storchaka
    Copy link
    Member

    The only case I know that would benefit is generating random data for tests. On my computer generating 2*28 bits with getrandbits() takes 2 sec (including 1 sec for converting from bytes to int), plus 1.4 sec for converting from int to bytes. Special API would speed up generating random bytes by 3.4 times. I would request for this functionality, but I don't want to clutter the API.

    As for getrandbits.diff, I think that we should follow the principle "we're all consenting adults here".

    @rhettinger
    Copy link
    Contributor

    Sorry Steven, I'm going to mark this as rejected on the grounds that it is likely to do more harm than good. We could in fact make the range larger but it easily creates terrible effects (encouraging bad design and creating a non-interruptable, long-running, total-memory-filling call).

    While we do allow 2 ** 50 ** 50, that call is more deliberately asking for trouble than getrandbits(2**60). If someone really needed that number of bits, it isn't hard to multiple calls to getrandbits() and combine the results, deliberately and interruptably.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir OS-windows type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants