classification
Title: os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer
Type: security Stage: needs patch
Components: Interpreter Core Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: vstinner Nosy List: dstufft, jbeck, python-dev, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2015-09-04 21:44 by jbeck, last changed 2015-10-01 16:13 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
remove_get_entropy.patch vstinner, 2015-09-07 21:54 review
urandom_solaris.patch vstinner, 2015-09-08 21:46 review
getrandom_solaris.patch vstinner, 2015-09-11 11:19 review
Messages (30)
msg249837 - (view) Author: John Beck (jbeck) Date: 2015-09-04 21:44
A recent Solaris build upgrade resulted in a massive slowdown of a package operation (our package client is written in Python).  Some DTrace revealed this was because os.urandom() calls had slowed down by a factor of over 300.  This turned out to be caused by an upgrade of Python from 2.7.9 to 2.7.10 which included:

- Issue #22585: On OpenBSD 5.6 and newer, os.urandom() now calls getentropy(),
  instead of reading /dev/urandom, to get pseudo-random bytes.

By adding ac_cv_func_getentropy=no to our configure options, we were able to get back the performance we had lost.  But our security experts warned:

---

OpenBSD only has getentropy(2) and we are compatible with that.
Linux has both getentropy(2) and getrandom(2)
Solaris has getentropy(2) and getrandom(2).

The bug is in Python it should not use getentropy(2) for the implementation of os.urandom() unless it builds its own DRBG (Deterministic Random Bit Generator) around that - which will mean it is "caching" and thus only calling getentropy() for seeding very infrequently.

You can not substitute getentropy() for getrandom(), if you need randomness then entropy does not suffice.

They are using getentropy(2) correctly in the implementation of _PyRandom_Init().

I would personally recommend that the upstream adds os.getentropy() changes
os.urandom() to use getrandom(buf, sz, GRND_NONBLOCK) and os.random() to use
getrandom(buf, sz, GRND_RANDOM).

As it is the upstream implementation is arguably a security vulnerability since you can not use entropy where you need randomness.
---

where by "upstream" he meant "python.org".  So I am filing this bug to request those changes.  I can probably prototype this in the reasonable near future, but I wanted to get the bug filed sooner rather than later.
msg250131 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-07 21:54
Attached patch (for Python 3.4) removes code to use getentropy() in os.urandom().
msg250142 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-09-08 00:39
I got several long private emails from Theo De Raadt about this issue. I think the gist of it all is that most likely (a) the app most likely shouldn't be calling os.urandom() that often, and (b) Solaris getentropy() is apparently stunningly slow. Theo also seems to imply that "entropy" as a concept is debunked, and OpenBSD getentropy() and getrandom() are essentially the same thing except that the former's interface is faster (because it doesn't use a FD). So it is also possible that using getentropy() instead of getrandom() was unnecessary; possibly the speed difference would be noticeable via Python. Perhaps the getentropy() check can explicitly rule out Solaris (either at the autoconf level or in the random.c source code) if you prefer to keep the getentropy() call on OpenBSD.
msg250211 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-09-08 14:39
Also, Theo believes that our Mersenne Twister is outdated and os.urandom() is the only reasonable alternative. So we might as well keep it fast.
msg250241 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2015-09-08 17:20
Guido, you're clearly talking with someone who knows too much ;-)  If we're using the Twister for _anything_ related to crypto-level randomness, then I'd appalled - it was utterly unsuitable for any such purpose from day 1.  But as a general-purpose generator for non-crypto uses, it remains an excellent choice, and still a nearly de facto standard for "almost all" such purposes.  There are general-purpose generators I like better now, but won't push in that direction until the Twister's "nearly de facto standard" status fades.  Better to be a follower than a leader in this area.
msg250243 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-09-08 17:29
To Theo it's probably inconceivable that anyone would be using random numbers for anything besides crypto security. :-) His favorite is arc4random() (but he notes it's not based on RC4 anymore, but uses chacha -- I have no idea what any of that means :-). He did say userland PRNG a few times.
msg250246 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2015-09-08 17:50
(A)RC4 and ChaCha are just two stream ciphers that let you encrypt some data, they work by essentially producing a psuedo-random stream of data in a deterministic manner based off of a key, and than that is XOR'd with the data you want to encrypt. arc4random (ab)uses this and uses "real" entropy (e.g. randomness pulled from random noise on the network and such) as the "key" and then uses the psuedo-random stream of data as the values you get when you ask arc4random for some random data. The actual process is quite a bit more complex then that, but that's the basic gist.

Userspace PRNG's are not a very good idea for reasons better explained by an expert: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

And yea, using MT for anything that needs a CSPRNG (that is, a Cryptographically Secure Psuedo Random Number Generator) is a real bad idea, because the numbers it outputs are not "really" random. I'm of a mind that the APIs should default to CSPRNGs (so ``random`` should default to SystemRandom) and using something like MT should be opt in via something like "UnsafeFastRandom) or something. That ship is almost certainly sailed at this point though.
msg250247 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2015-09-08 17:53
Oh yea, and (A)RC4 is broken and shouldn't be used for anything anymore, ChaCha is much better and is pretty great.
msg250257 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-08 21:29
Thread on python-dev: https://mail.python.org/pipermail/python-dev/2015-September/141439.html
msg250258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-08 21:46
> Perhaps the getentropy() check can explicitly rule out Solaris (either at the autoconf level or in the random.c source code) if you prefer to keep the getentropy() call on OpenBSD.

Ok, here is a patch implementing this option. It keeps getentropy() on OpenBSD for os.urandom(), but it disables getentropy() on Solaris for os.urandom().

I don't know if my py_getrandom() function calling syscall(SYS_getrandom, buffer, size, 0) works on Solaris. The flags are hardcoded, and I don't know if the <sys/syscall.h> include is enough to get the syscall() function. (Does Solaris uses the GNU C library?)

@jbeck: Can you please test this patch on the default branch of Python? Can you tell if the HAVE_GETRANDOM_SYSCALL check succeed on Solaris? (do you have "#define HAVE_GETRANDOM_SYSCALL 1" in pyconfig.h?)

The configure scripts tries to compile the following C program:

    #include <sys/syscall.h>

    int main() {
        const int flags = 0;
        char buffer[1];
        int n;
        /* ignore the result, Python checks for ENOSYS at runtime */
        (void)syscall(SYS_getrandom, buffer, sizeof(buffer), flags);
        return 0;
    }
msg250315 - (view) Author: John Beck (jbeck) Date: 2015-09-09 15:17
@haypo: Yes, that patch works (applied to 3.5.0rc3; proxy problems are preventing me from updating my clone of the default branch at the moment) i.e., pyconfig.h shows:

#define HAVE_GETRANDOM_SYSCALL 1

To answer your other questions:
* Calling syscall(SYS_getrandom, buffer, size, 0) does work on Solaris.
* Our <sys/syscall.h> does declare syscall().
* Solaris does not use the GNU C library, but our own libc, which does
  support getrandom(3) as of Solaris 11.3 (which will be released in
  the near future) and Solaris 12 (which is in Beta but has a release
  date in the more distant future).

Prior to applying this patch, I had needed to tweak py_getrandom() to recognize EINVAL in addition to ENOSYS as sufficient reason to turn off getrandom_works.  I still have that tweak in place, but have not yet tried backing it out to see if things still behave with your patch.

Let me know if there is any more information I can provide, and thanks for your attention to this issue.
msg250321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-09 16:18
> Prior to applying this patch, I had needed to tweak py_getrandom() to recognize EINVAL in addition to ENOSYS as sufficient reason to turn off getrandom_works.

In which case getrandom() fails with EINVAL?
msg250323 - (view) Author: John Beck (jbeck) Date: 2015-09-09 16:27
Yes, the syscall was failing with EINVAL.
msg250403 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-10 18:59
> Yes, the syscall was failing with EINVAL.

Sorry, I don't understand. Do you mean that calling a non-existant syscall on Solaris fails with EINVAL? Calling the getrandom() syscall on Solaris < 11.3 fails with EINVAL?
msg250415 - (view) Author: John Beck (jbeck) Date: 2015-09-10 19:40
Sorry, let me try to clarify.  ENOSYS is the correct errno value to check for, for the situation (e.g., Solaris < 11.3) of a non-existent system call. But EINVAL should also be checked for to make sure the system call was invoked with proper parameters.  My builds of Python 3.5.0.X (don't recall whether X was a late Beta or release candidate) were core dumping because Python was making that syscall but not checking for EINVAL, and thus assuming the call was valid, when it was not.  Sorry, I did not try to debug why it was invalid.  But once I added EINVAL, alongside ENOSYS, as a reason to set getrandom_works to 0, then python started behaving properly.
msg250463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-11 11:19
> But EINVAL should also be checked for to make sure the system call was invoked with proper parameters.

py_getrandom() calls Py_FatalError() or raises an OSError on EINVAL. The error is not silently ignored.


> My builds of Python 3.5.0.X (don't recall whether X was a late Beta or release candidate) were core dumping because Python was making that syscall but not checking for EINVAL,

Py_FatalError() calls abort(). Usually, operating systems dump a core dump in this case. But it is the expected behaviour. Python now refuses to start if the OS random number generator doesn't work. There are similar checks on the system and monotonic clocks for example.

> ... and thus assuming the call was valid, when it was not.

Ah! Finally you explain the problem :-) I wrote py_getrandom() for Linux. I didn't expect other operating systems to implement the getrandom() syscall. I hardcoded the flags (0) for example.

py_getrandom() calls directly the syscall, because I like the new cool getrandom() syscall of Linux: it avoids the need of a private file descriptor. It would be much better to call a function of the C library, but the GNU C library didn't expose the function yet...

On Solaris, the function is available in C, no need to call directly the syscall. It would be better to call the C function, and check if it's available in configure.

Can you please try remove_get_entropy.patch + urandom_solaris.patch?
msg250478 - (view) Author: John Beck (jbeck) Date: 2015-09-11 17:11
Yes, those patches work, with a caveat.  While testing this, I found out why I had needed EINVAL earlier (and still do, for now): there is a bug in the Solaris implementation of getrandom(2).  If flags are 0 and the buffer size > 1024, then it fails with EINVAL.  That is only supposed to happen for a buffer that big if GNRD_RANDOM is set in flags but GNRD_NONBLOCK is not set.  So until that bug is fixed, I have to patch py_getrandom() to treat EINVAL like ENOSYS and fall back to using /dev/urandom as if getrandom(2) were not supported.
msg250496 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-11 21:12
>  While testing this, I found out why I had needed EINVAL earlier (and still do, for now): there is a bug in the Solaris implementation of getrandom(2).  If flags are 0 and the buffer size > 1024, then it fails with EINVAL.  That is only supposed to happen for a buffer that big if GNRD_RANDOM is set in flags but GNRD_NONBLOCK is not set.  So until that bug is fixed, ...

Ah! So it was useful to dig the EINVAL issue, it's a bug in the kernel :-)

You wrote that the final version of Solaris 11.3 and 12 was not released yet. Can we expect a fix in the kernel before the final version?
msg250497 - (view) Author: John Beck (jbeck) Date: 2015-09-11 21:18
I have queried the engineer who owns the kernel bug and will post an update once I hear back from him.  But as it is already almost midnight on Friday in his geo, it may well be Monday before I hear back.
msg250681 - (view) Author: John Beck (jbeck) Date: 2015-09-14 16:46
The owner of the Solaris kernel bug has confirmed that he plans to get
the fix into both Solaris 12 and 11.3.
msg250983 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-18 13:38
New changeset 8b0c2c7cb4a7 by Victor Stinner in branch 'default':
Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
https://hg.python.org/cpython/rev/8b0c2c7cb4a7
msg250984 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-18 13:43
Ok, I pushed a first fix for Python 3.6. The changeset 8b0c2c7cb4a7 keeps getentropy() on OpenBSD, but it explicitly excludes getentropy() on Solaris ("!defined(sun)"). It adds support for the getrandom() function. As the EINVAL error will be fixed in the final version of Solaris 11.3, I chose to not support this specific error.

@John Beck: Does it look good to you? Can you test it? You may have to manually modify the code to not pass a value too large to getrandom() until the Solaris kernel is fixed.

If John and buildbots are happy, I will backport the change to Python 2.7, 3.4 and 3.5.
msg250993 - (view) Author: John Beck (jbeck) Date: 2015-09-18 14:17
I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass.  In test_os.py you had:

...
USE_GETENTROPY = ((sysconfig.get_config_var('HAVE_GETENTROPY') == 1)
                  and not sys.platform.startswith("sunos"))
HAVE_GETRANDOM = (sysconfig.get_config_var('HAVE_GETRANDOM_SYSCALL') == 1)
 
@unittest.skipIf(USE_GETENTROPY,
                 "getentropy() does not use a file descriptor")
@unittest.skipIf(HAVE_GETRANDOM,
                 "getrandom() does not use a file descriptor")
...

whereas I came up with this:

...
USE_GETENTROPY = ((sysconfig.get_config_var('HAVE_GETENTROPY') == 1)
                  and not sys.platform.startswith("sunos"))
HAVE_GETRANDOM = (sysconfig.get_config_var('HAVE_GETRANDOM') == 1)
HAVE_GETRANDOM_SYSCALL = (sysconfig.get_config_var('HAVE_GETRANDOM_SYSCALL') == 1)

@unittest.skipIf(USE_GETENTROPY,
                 "getentropy() does not use a file descriptor")
@unittest.skipIf(HAVE_GETRANDOM,
                 "getrandom() does not use a file descriptor")
@unittest.skipIf(HAVE_GETRANDOM_SYSCALL,
                 "getrandom() does not use a file descriptor")
...
msg250995 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-18 14:26
New changeset 221e09b89186 by Victor Stinner in branch 'default':
Issue #25003: Skip test_os.URandomFDTests on Solaris 11.3 and newer
https://hg.python.org/cpython/rev/221e09b89186
msg250996 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-18 14:27
"I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass.  In test_os.py you had: (...)"

Oh right, I fixed test_os too.
msg252005 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-10-01 07:51
New changeset 835085cc28cd by Victor Stinner in branch '3.5':
Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom()
https://hg.python.org/cpython/rev/835085cc28cd
msg252007 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-10-01 08:02
New changeset 202c827f86df by Victor Stinner in branch '2.7':
Issue #25003: os.urandom() doesn't use getentropy() on Solaris because
https://hg.python.org/cpython/rev/202c827f86df

New changeset 83dc79eeaf7f by Victor Stinner in branch '3.4':
Issue #25003: os.urandom() doesn't use getentropy() on Solaris because
https://hg.python.org/cpython/rev/83dc79eeaf7f
msg252008 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-10-01 08:06
Ok, I pushed fixes for Python 2.7, 3.4, 3.5 and 3.6.

Summary for Solaris:

- Python 2.7, 3.4: read from /dev/urandom (use a private file descriptor)
- Python 3.5, 3.6: use the getrandom() function on Solaris 11.3 and newer (no file descriptor), or fallback on reading from /dev/urandom (use a private file descriptor)

getentropy() is no more used on Solaris.

@John Beck: It would be great if you could run test_os on the development branches 2.7, 3.4, 3.5 and default (3.6).
msg252035 - (view) Author: John Beck (jbeck) Date: 2015-10-01 16:08
Confirmed that test_os runs cleanly on Solaris 12, for each of:
* 2.7.10 (plus your patch from 98454:202c827f86df)
* 3.4.3 (plus your patch from 98455:83dc79eeaf7f)
* 3.5.0 (plus your patch from 98452:835085cc28cd)
* 3.6 (tip)
msg252037 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-10-01 16:13
Thanks! I close the issue, it's now fixed.
History
Date User Action Args
2015-10-01 16:13:39vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg252037
2015-10-01 16:08:58jbecksetmessages: + msg252035
2015-10-01 08:06:26vstinnersetmessages: + msg252008
2015-10-01 08:02:03python-devsetmessages: + msg252007
2015-10-01 07:51:55python-devsetmessages: + msg252005
2015-09-18 14:27:04vstinnersetmessages: + msg250996
2015-09-18 14:26:39python-devsetmessages: + msg250995
2015-09-18 14:17:49jbecksetmessages: + msg250993
2015-09-18 13:43:30vstinnersetmessages: + msg250984
2015-09-18 13:38:54python-devsetnosy: + python-dev
messages: + msg250983
2015-09-14 16:46:17jbecksetmessages: + msg250681
2015-09-11 21:18:08jbecksetmessages: + msg250497
2015-09-11 21:12:06vstinnersetmessages: + msg250496
2015-09-11 17:11:41jbecksetmessages: + msg250478
2015-09-11 11:19:23vstinnersetfiles: + getrandom_solaris.patch

messages: + msg250463
2015-09-10 19:40:25jbecksetmessages: + msg250415
2015-09-10 18:59:21vstinnersettitle: os.urandom() should call getrandom(2) not getentropy(2) on Solaris -> os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer
2015-09-10 18:59:15vstinnersettitle: os.urandom() should call getrandom(2) not getentropy(2) -> os.urandom() should call getrandom(2) not getentropy(2) on Solaris
2015-09-10 18:59:04vstinnersetmessages: + msg250403
2015-09-09 16:27:45jbecksetmessages: + msg250323
2015-09-09 16:18:42vstinnersetmessages: + msg250321
2015-09-09 15:17:06jbecksetmessages: + msg250315
2015-09-08 21:46:17vstinnersetfiles: + urandom_solaris.patch

messages: + msg250258
2015-09-08 21:29:53vstinnersetmessages: + msg250257
2015-09-08 18:07:56gvanrossumsetnosy: - gvanrossum
2015-09-08 17:53:04dstufftsetmessages: + msg250247
2015-09-08 17:50:25dstufftsetnosy: + dstufft
messages: + msg250246
2015-09-08 17:29:12gvanrossumsetmessages: + msg250243
2015-09-08 17:20:55tim.peterssetnosy: + tim.peters
messages: + msg250241
2015-09-08 14:39:21gvanrossumsetmessages: + msg250211
2015-09-08 00:39:27gvanrossumsetnosy: + gvanrossum
messages: + msg250142
2015-09-07 21:54:51vstinnersetfiles: + remove_get_entropy.patch
keywords: + patch
messages: + msg250131
2015-09-05 21:40:10rhettingersetassignee: vstinner
2015-09-04 22:22:57r.david.murraysetnosy: + vstinner

type: security
stage: needs patch
2015-09-04 21:44:28jbeckcreate