Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer #69191

Closed
jbeck mannequin opened this issue Sep 4, 2015 · 30 comments
Closed

os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer #69191

jbeck mannequin opened this issue Sep 4, 2015 · 30 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue

Comments

@jbeck
Copy link
Mannequin

jbeck mannequin commented Sep 4, 2015

BPO 25003
Nosy @tim-one, @vstinner, @dstufft
Files
  • remove_get_entropy.patch
  • urandom_solaris.patch
  • getrandom_solaris.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2015-10-01.16:13:39.666>
    created_at = <Date 2015-09-04.21:44:28.198>
    labels = ['type-security', 'interpreter-core']
    title = 'os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer'
    updated_at = <Date 2015-10-01.16:13:39.665>
    user = 'https://bugs.python.org/jbeck'

    bugs.python.org fields:

    activity = <Date 2015-10-01.16:13:39.665>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2015-10-01.16:13:39.666>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2015-09-04.21:44:28.198>
    creator = 'jbeck'
    dependencies = []
    files = ['40397', '40412', '40437']
    hgrepos = []
    issue_num = 25003
    keywords = ['patch']
    message_count = 30.0
    messages = ['249837', '250131', '250142', '250211', '250241', '250243', '250246', '250247', '250257', '250258', '250315', '250321', '250323', '250403', '250415', '250463', '250478', '250496', '250497', '250681', '250983', '250984', '250993', '250995', '250996', '252005', '252007', '252008', '252035', '252037']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'vstinner', 'python-dev', 'dstufft', 'jbeck']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue25003'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 4, 2015

    A recent Solaris build upgrade resulted in a massive slowdown of a package operation (our package client is written in Python). Some DTrace revealed this was because os.urandom() calls had slowed down by a factor of over 300. This turned out to be caused by an upgrade of Python from 2.7.9 to 2.7.10 which included:

    • Issue bpo-22585: On OpenBSD 5.6 and newer, os.urandom() now calls getentropy(),
      instead of reading /dev/urandom, to get pseudo-random bytes.

    By adding ac_cv_func_getentropy=no to our configure options, we were able to get back the performance we had lost. But our security experts warned:

    ---

    OpenBSD only has getentropy(2) and we are compatible with that.
    Linux has both getentropy(2) and getrandom(2)
    Solaris has getentropy(2) and getrandom(2).

    The bug is in Python it should not use getentropy(2) for the implementation of os.urandom() unless it builds its own DRBG (Deterministic Random Bit Generator) around that - which will mean it is "caching" and thus only calling getentropy() for seeding very infrequently.

    You can not substitute getentropy() for getrandom(), if you need randomness then entropy does not suffice.

    They are using getentropy(2) correctly in the implementation of _PyRandom_Init().

    I would personally recommend that the upstream adds os.getentropy() changes
    os.urandom() to use getrandom(buf, sz, GRND_NONBLOCK) and os.random() to use
    getrandom(buf, sz, GRND_RANDOM).

    As it is the upstream implementation is arguably a security vulnerability since you can not use entropy where you need randomness.
    ---

    where by "upstream" he meant "python.org". So I am filing this bug to request those changes. I can probably prototype this in the reasonable near future, but I wanted to get the bug filed sooner rather than later.

    @jbeck jbeck mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 4, 2015
    @bitdancer bitdancer added the type-security A security issue label Sep 4, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Sep 7, 2015

    Attached patch (for Python 3.4) removes code to use getentropy() in os.urandom().

    @gvanrossum
    Copy link
    Member

    I got several long private emails from Theo De Raadt about this issue. I think the gist of it all is that most likely (a) the app most likely shouldn't be calling os.urandom() that often, and (b) Solaris getentropy() is apparently stunningly slow. Theo also seems to imply that "entropy" as a concept is debunked, and OpenBSD getentropy() and getrandom() are essentially the same thing except that the former's interface is faster (because it doesn't use a FD). So it is also possible that using getentropy() instead of getrandom() was unnecessary; possibly the speed difference would be noticeable via Python. Perhaps the getentropy() check can explicitly rule out Solaris (either at the autoconf level or in the random.c source code) if you prefer to keep the getentropy() call on OpenBSD.

    @gvanrossum
    Copy link
    Member

    Also, Theo believes that our Mersenne Twister is outdated and os.urandom() is the only reasonable alternative. So we might as well keep it fast.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 8, 2015

    Guido, you're clearly talking with someone who knows too much ;-) If we're using the Twister for _anything_ related to crypto-level randomness, then I'd appalled - it was utterly unsuitable for any such purpose from day 1. But as a general-purpose generator for non-crypto uses, it remains an excellent choice, and still a nearly de facto standard for "almost all" such purposes. There are general-purpose generators I like better now, but won't push in that direction until the Twister's "nearly de facto standard" status fades. Better to be a follower than a leader in this area.

    @gvanrossum
    Copy link
    Member

    To Theo it's probably inconceivable that anyone would be using random numbers for anything besides crypto security. :-) His favorite is arc4random() (but he notes it's not based on RC4 anymore, but uses chacha -- I have no idea what any of that means :-). He did say userland PRNG a few times.

    @dstufft
    Copy link
    Member

    dstufft commented Sep 8, 2015

    (A)RC4 and ChaCha are just two stream ciphers that let you encrypt some data, they work by essentially producing a psuedo-random stream of data in a deterministic manner based off of a key, and than that is XOR'd with the data you want to encrypt. arc4random (ab)uses this and uses "real" entropy (e.g. randomness pulled from random noise on the network and such) as the "key" and then uses the psuedo-random stream of data as the values you get when you ask arc4random for some random data. The actual process is quite a bit more complex then that, but that's the basic gist.

    Userspace PRNG's are not a very good idea for reasons better explained by an expert: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

    And yea, using MT for anything that needs a CSPRNG (that is, a Cryptographically Secure Psuedo Random Number Generator) is a real bad idea, because the numbers it outputs are not "really" random. I'm of a mind that the APIs should default to CSPRNGs (so random should default to SystemRandom) and using something like MT should be opt in via something like "UnsafeFastRandom) or something. That ship is almost certainly sailed at this point though.

    @dstufft
    Copy link
    Member

    dstufft commented Sep 8, 2015

    Oh yea, and (A)RC4 is broken and shouldn't be used for anything anymore, ChaCha is much better and is pretty great.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2015

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2015

    Perhaps the getentropy() check can explicitly rule out Solaris (either at the autoconf level or in the random.c source code) if you prefer to keep the getentropy() call on OpenBSD.

    Ok, here is a patch implementing this option. It keeps getentropy() on OpenBSD for os.urandom(), but it disables getentropy() on Solaris for os.urandom().

    I don't know if my py_getrandom() function calling syscall(SYS_getrandom, buffer, size, 0) works on Solaris. The flags are hardcoded, and I don't know if the <sys/syscall.h> include is enough to get the syscall() function. (Does Solaris uses the GNU C library?)

    @jbeck: Can you please test this patch on the default branch of Python? Can you tell if the HAVE_GETRANDOM_SYSCALL check succeed on Solaris? (do you have "#define HAVE_GETRANDOM_SYSCALL 1" in pyconfig.h?)

    The configure scripts tries to compile the following C program:

        #include <sys/syscall.h>
    
        int main() {
            const int flags = 0;
            char buffer[1];
            int n;
            /* ignore the result, Python checks for ENOSYS at runtime */
            (void)syscall(SYS_getrandom, buffer, sizeof(buffer), flags);
            return 0;
        }

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 9, 2015

    @Haypo: Yes, that patch works (applied to 3.5.0rc3; proxy problems are preventing me from updating my clone of the default branch at the moment) i.e., pyconfig.h shows:

    #define HAVE_GETRANDOM_SYSCALL 1

    To answer your other questions:

    • Calling syscall(SYS_getrandom, buffer, size, 0) does work on Solaris.
    • Our <sys/syscall.h> does declare syscall().
    • Solaris does not use the GNU C library, but our own libc, which does
      support getrandom(3) as of Solaris 11.3 (which will be released in
      the near future) and Solaris 12 (which is in Beta but has a release
      date in the more distant future).

    Prior to applying this patch, I had needed to tweak py_getrandom() to recognize EINVAL in addition to ENOSYS as sufficient reason to turn off getrandom_works. I still have that tweak in place, but have not yet tried backing it out to see if things still behave with your patch.

    Let me know if there is any more information I can provide, and thanks for your attention to this issue.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 9, 2015

    Prior to applying this patch, I had needed to tweak py_getrandom() to recognize EINVAL in addition to ENOSYS as sufficient reason to turn off getrandom_works.

    In which case getrandom() fails with EINVAL?

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 9, 2015

    Yes, the syscall was failing with EINVAL.

    @vstinner
    Copy link
    Member

    Yes, the syscall was failing with EINVAL.

    Sorry, I don't understand. Do you mean that calling a non-existant syscall on Solaris fails with EINVAL? Calling the getrandom() syscall on Solaris < 11.3 fails with EINVAL?

    @vstinner vstinner changed the title os.urandom() should call getrandom(2) not getentropy(2) os.urandom() should call getrandom(2) not getentropy(2) on Solaris Sep 10, 2015
    @vstinner vstinner changed the title os.urandom() should call getrandom(2) not getentropy(2) on Solaris os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer Sep 10, 2015
    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 10, 2015

    Sorry, let me try to clarify. ENOSYS is the correct errno value to check for, for the situation (e.g., Solaris < 11.3) of a non-existent system call. But EINVAL should also be checked for to make sure the system call was invoked with proper parameters. My builds of Python 3.5.0.X (don't recall whether X was a late Beta or release candidate) were core dumping because Python was making that syscall but not checking for EINVAL, and thus assuming the call was valid, when it was not. Sorry, I did not try to debug why it was invalid. But once I added EINVAL, alongside ENOSYS, as a reason to set getrandom_works to 0, then python started behaving properly.

    @vstinner
    Copy link
    Member

    But EINVAL should also be checked for to make sure the system call was invoked with proper parameters.

    py_getrandom() calls Py_FatalError() or raises an OSError on EINVAL. The error is not silently ignored.

    My builds of Python 3.5.0.X (don't recall whether X was a late Beta or release candidate) were core dumping because Python was making that syscall but not checking for EINVAL,

    Py_FatalError() calls abort(). Usually, operating systems dump a core dump in this case. But it is the expected behaviour. Python now refuses to start if the OS random number generator doesn't work. There are similar checks on the system and monotonic clocks for example.

    ... and thus assuming the call was valid, when it was not.

    Ah! Finally you explain the problem :-) I wrote py_getrandom() for Linux. I didn't expect other operating systems to implement the getrandom() syscall. I hardcoded the flags (0) for example.

    py_getrandom() calls directly the syscall, because I like the new cool getrandom() syscall of Linux: it avoids the need of a private file descriptor. It would be much better to call a function of the C library, but the GNU C library didn't expose the function yet...

    On Solaris, the function is available in C, no need to call directly the syscall. It would be better to call the C function, and check if it's available in configure.

    Can you please try remove_get_entropy.patch + urandom_solaris.patch?

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 11, 2015

    Yes, those patches work, with a caveat. While testing this, I found out why I had needed EINVAL earlier (and still do, for now): there is a bug in the Solaris implementation of getrandom(2). If flags are 0 and the buffer size > 1024, then it fails with EINVAL. That is only supposed to happen for a buffer that big if GNRD_RANDOM is set in flags but GNRD_NONBLOCK is not set. So until that bug is fixed, I have to patch py_getrandom() to treat EINVAL like ENOSYS and fall back to using /dev/urandom as if getrandom(2) were not supported.

    @vstinner
    Copy link
    Member

    While testing this, I found out why I had needed EINVAL earlier (and still do, for now): there is a bug in the Solaris implementation of getrandom(2). If flags are 0 and the buffer size > 1024, then it fails with EINVAL. That is only supposed to happen for a buffer that big if GNRD_RANDOM is set in flags but GNRD_NONBLOCK is not set. So until that bug is fixed, ...

    Ah! So it was useful to dig the EINVAL issue, it's a bug in the kernel :-)

    You wrote that the final version of Solaris 11.3 and 12 was not released yet. Can we expect a fix in the kernel before the final version?

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 11, 2015

    I have queried the engineer who owns the kernel bug and will post an update once I hear back from him. But as it is already almost midnight on Friday in his geo, it may well be Monday before I hear back.

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 14, 2015

    The owner of the Solaris kernel bug has confirmed that he plans to get
    the fix into both Solaris 12 and 11.3.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 18, 2015

    New changeset 8b0c2c7cb4a7 by Victor Stinner in branch 'default':
    Issue bpo-25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
    https://hg.python.org/cpython/rev/8b0c2c7cb4a7

    @vstinner
    Copy link
    Member

    Ok, I pushed a first fix for Python 3.6. The changeset 8b0c2c7cb4a7 keeps getentropy() on OpenBSD, but it explicitly excludes getentropy() on Solaris ("!defined(sun)"). It adds support for the getrandom() function. As the EINVAL error will be fixed in the final version of Solaris 11.3, I chose to not support this specific error.

    @john Beck: Does it look good to you? Can you test it? You may have to manually modify the code to not pass a value too large to getrandom() until the Solaris kernel is fixed.

    If John and buildbots are happy, I will backport the change to Python 2.7, 3.4 and 3.5.

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Sep 18, 2015

    I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass. In test_os.py you had:

    ...
    USE_GETENTROPY = ((sysconfig.get_config_var('HAVE_GETENTROPY') == 1)
    and not sys.platform.startswith("sunos"))
    HAVE_GETRANDOM = (sysconfig.get_config_var('HAVE_GETRANDOM_SYSCALL') == 1)

    @unittest.skipIf(USE_GETENTROPY,
    "getentropy() does not use a file descriptor")
    @unittest.skipIf(HAVE_GETRANDOM,
    "getrandom() does not use a file descriptor")
    ...

    whereas I came up with this:

    ...
    USE_GETENTROPY = ((sysconfig.get_config_var('HAVE_GETENTROPY') == 1)
    and not sys.platform.startswith("sunos"))
    HAVE_GETRANDOM = (sysconfig.get_config_var('HAVE_GETRANDOM') == 1)
    HAVE_GETRANDOM_SYSCALL = (sysconfig.get_config_var('HAVE_GETRANDOM_SYSCALL') == 1)

    @unittest.skipIf(USE_GETENTROPY,
    "getentropy() does not use a file descriptor")
    @unittest.skipIf(HAVE_GETRANDOM,
    "getrandom() does not use a file descriptor")
    @unittest.skipIf(HAVE_GETRANDOM_SYSCALL,
    "getrandom() does not use a file descriptor")
    ...

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 18, 2015

    New changeset 221e09b89186 by Victor Stinner in branch 'default':
    Issue bpo-25003: Skip test_os.URandomFDTests on Solaris 11.3 and newer
    https://hg.python.org/cpython/rev/221e09b89186

    @vstinner
    Copy link
    Member

    "I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass. In test_os.py you had: (...)"

    Oh right, I fixed test_os too.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 1, 2015

    New changeset 835085cc28cd by Victor Stinner in branch '3.5':
    Issue bpo-25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
    https://hg.python.org/cpython/rev/835085cc28cd

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 1, 2015

    New changeset 202c827f86df by Victor Stinner in branch '2.7':
    Issue bpo-25003: os.urandom() doesn't use getentropy() on Solaris because
    https://hg.python.org/cpython/rev/202c827f86df

    New changeset 83dc79eeaf7f by Victor Stinner in branch '3.4':
    Issue bpo-25003: os.urandom() doesn't use getentropy() on Solaris because
    https://hg.python.org/cpython/rev/83dc79eeaf7f

    @vstinner
    Copy link
    Member

    vstinner commented Oct 1, 2015

    Ok, I pushed fixes for Python 2.7, 3.4, 3.5 and 3.6.

    Summary for Solaris:

    • Python 2.7, 3.4: read from /dev/urandom (use a private file descriptor)
    • Python 3.5, 3.6: use the getrandom() function on Solaris 11.3 and newer (no file descriptor), or fallback on reading from /dev/urandom (use a private file descriptor)

    getentropy() is no more used on Solaris.

    @john Beck: It would be great if you could run test_os on the development branches 2.7, 3.4, 3.5 and default (3.6).

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Oct 1, 2015

    Confirmed that test_os runs cleanly on Solaris 12, for each of:

    • 2.7.10 (plus your patch from 98454:202c827f86df)
    • 3.4.3 (plus your patch from 98455:83dc79eeaf7f)
    • 3.5.0 (plus your patch from 98452:835085cc28cd)
    • 3.6 (tip)

    @vstinner
    Copy link
    Member

    vstinner commented Oct 1, 2015

    Thanks! I close the issue, it's now fixed.

    @vstinner vstinner closed this as completed Oct 1, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants