classification
Title: Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: larry, lemburg, ncoghlan
Priority: normal Keywords:

Created on 2016-06-09 07:54 by lemburg, last changed 2016-09-07 02:35 by ncoghlan. This issue is now closed.

Messages (12)
msg267970 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 07:54
I propose to deprecate os.urandom() altogether due to all the
issues we've discussed on all those recent tickets, see e.g. 
#26839, #27250, #25420.

Unlike what we've told people for many years, it's clear that in the
age of VMs/containers getting booted/started every few seconds, it's
not longer the generic correct answer to people asking for random
data, since it doesn't make a difference between crypto random and 
pseudo random data.

By far most use cases only need pseudo random data and only very
few applications require crypto random data.

Instead, let's define something everybody can start to use correctly
and get sane behavior on most if not all platforms. As Larry
suggested in #27266, getrandom() is a good starting point for this,
since it's adoption is spreading fast and it provides the necessary
features we need for the two new APIs.

I propose these new APIs:

 * random.cryptorandom() for getting OS crypto random data

 * random.pseudorandom() for getting OS pseudo random data

Crypto applications will then clearly know that random.cryptorandom()
is the right choice for them and everyone else can use 
random.pseudorandom().

random.cryptorandom() will guarantee that the returned data
is safe for crypto applications on all platforms, blocking or
raising an exception if necessary to make sure only safe
data is returned. The API should get a parameter to determine
whether to raise or block.

random.pseudorandom() will guarantee to not block and always
return random data that can be used as seeds for simulations, 
games, tests, monte carlo, etc.

The APIs should use the getrandom() C API, where available,
with appropriate default settings, i.e. blocking or raising
for random.cryptorandom() and non-blocking, non-raising for
random.pseudorandom().

The existing os.urandom() would then be deprecated to guide
new developments to the these new APIs, getting rid of the
ambiguities and problems this interface has on several platforms
(see all the other tickets and https://en.wikipedia.org/wiki//dev/random
for details).
msg267972 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 07:57
Fleshing out the API signatures and implementation details will have to be done in a PEP. 

The topic is (as all the related ticket show) too complex for discussions on a bug tracker.

I just opened this ticket for reference to the idea.
msg267977 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-09 08:02
-1

os.urandom() is just fine. Let's not confuse users and make it even harder to write secure software.
msg267978 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-09 08:07
I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers.

(IDK if the best place for both is in random; maybe the crypto one should be in secrets?)

To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data".  Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that.  (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)

I'm -1 on actually deprecating os.urandom().  It should be left alone, as a thin wrapper around /dev/urandom.  I imagine your cryptorandom() and pseudorandom() functions would usually be written in Python and just import and use the appropriate function on a platform-by-platform basis.
msg267983 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 08:12
Some resources:

 * getrandom() man page:
   http://man7.org/linux/man-pages/man2/getrandom.2.html

 * nice readup on how getrandom() was introduced:
   https://lwn.net/Articles/606141/

 * random devices implementation on Linux:
   http://lxr.free-electrons.com/source/drivers/char/random.c

 * getrandom() implementation on Linux:
   http://lxr.free-electrons.com/source/drivers/char/random.c#L1601
   (built straight on opt of the device APIs)
msg267991 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 08:30
On 09.06.2016 10:07, Larry Hastings wrote:
> 
> I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers.
> 
> (IDK if the best place for both is in random; maybe the crypto one should be in secrets?)

All up for discussion. As long as we get the separation clear,
I'm fine with any location in the stdlib.

> To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data".  Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that.  (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)

Yes, I know, this should be documented in the docs for
random.cryptorandom().

We might even make the available entropy available as
additional API, on platforms where this is possible,
or even add APIs to access the entropy daemon where available:

http://egd.sourceforge.net/

(the necessary API is available via OpenSSL:
http://linux.die.net/man/3/rand_egd)

Some crypto applications do need to know a bit more about where
the random data is coming from, e.g. for generation of root
certificates and secure one time pads.

> I'm -1 on actually deprecating os.urandom().  It should be left alone, as a thin wrapper around /dev/urandom.  I imagine your cryptorandom() and pseudorandom() functions would usually be written in Python and just import and use the appropriate function on a platform-by-platform basis.

Fair enough. I don't feel strong about this part. The main idea
here was to move people away from thinking that we can fix a broken
system, which is not under our control (because it's a shim on an
OS device).

How we implement the functions is up to debate as well. I could
imaging that we expose the getrandom() function as e.g.
random._getrandom() and then use this from Python where available,
with fallbacks to other solutions where necessary. This would
also make it possible to have similar functionality on non-CPython
platforms and opens up the door for future changes without
breaking applications again.
msg267995 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-09 08:42
On 2016-06-09 10:30, Marc-Andre Lemburg wrote:
> 
> Marc-Andre Lemburg added the comment:
> 
> On 09.06.2016 10:07, Larry Hastings wrote:
>>
>> I +1 on new functions that are designated the best-practice places to get your pseudo-random numbers.
>>
>> (IDK if the best place for both is in random; maybe the crypto one should be in secrets?)
> 
> All up for discussion. As long as we get the separation clear,
> I'm fine with any location in the stdlib.
> 
>> To be precise: on most OSes what you're calling "crypto random data" is actually "crypto-quality pseudo-random data".  Very few platforms provide genuine random data--rather, they seed a CPRNG and give you data from that.  (And then the cryptographers have endless arguments about whether the CPRNG is really crypto-safe.)
> 
> Yes, I know, this should be documented in the docs for
> random.cryptorandom().
> 
> We might even make the available entropy available as
> additional API, on platforms where this is possible,
> or even add APIs to access the entropy daemon where available:

EDG has died about 15 years ago. Please don't reanimate it.

> Some crypto applications do need to know a bit more about where
> the random data is coming from, e.g. for generation of root
> certificates and secure one time pads.

No, that is not how applications deal with certificates or OTPs. When an
application is really, REALLY concerned with RNG source on that level,
it will never ever use Python or even a Kernel CSPRNG to generate
private keys. Instead it will use a certified, industrial grade HSM
(hardware security model) to offload all cryptographic operations on a
secure device.
msg268005 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 09:21
Some more resources for FreeBSD:

 * /dev/random and /dev/urandom (symlink to /dev/random) ma page:
   https://www.freebsd.org/cgi/man.cgi?query=random&sektion=4

 * Developer discussion about /dev/random and its future from 2013:
   https://wiki.freebsd.org/201308DevSummit/Security/DevRandom
 
 * FreeBSD uses the Yarrow CPRNG for /dev/random:
   https://www.usenix.org/legacy/events/bsdcon02/full_papers/murray/murray_html/
   https://www.schneier.com/academic/archives/2000/01/yarrow-160.html

 * FreeBSD will likely switch to the new Fortuna successor of Yarrow in an upcoming release:
   https://www.schneier.com/academic/fortuna/
msg268010 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-09 09:32
>  * FreeBSD will likely switch to the new Fortuna successor of Yarrow in an upcoming release:

A little more information about that.

FreeBSD did a major refactoring of their /dev/urandom (etc) support, which landed in October 2014:

https://svnweb.freebsd.org/base?view=revision&revision=273872

This kept Yarrow but also added Fortuna.  You can switch between them with a kernel option.

FreeBSD 10 shipped in January 2014, so clearly this rework didn't make it in.

I see several references to "let's make Fortuna the default in FreeBSD 11".  FreeBSD 11 hasn't shipped yet.  However, the "what's new in FreeBSD 11" wiki page doesn't mention changing this default.  So I don't know whether or not it's happening for 11.
msg268014 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-09 10:38
Resources for entropy gathering sources:

 * Kernel based devices such as /dev/random:
   https://en.wikipedia.org/wiki//dev/random

 * EGD - old entropy gathering daemon; blocks when out of
   entropy
   http://egd.sourceforge.net/
   (not maintained anymore)

   Important here is not the original implementation, but
   the Unix domain socket interface, which many applications
   support.

 * PRNG - provides the EGD interface, but feeds entropy into
   the OpenSSL pool; essentially a CPRNG with EGD interface.
   http://prngd.sourceforge.net/
   
 * Virtio RNG - paravirtualized device for passing host RNG
   to guest VMs (running under KVM)
   https://fedoraproject.org/wiki/Features/Virtio_RNG
   
 * haveged - entropy daemon which feeds entropy into the
   Linux /dev/random pool
   http://www.issihosts.com/haveged/
   https://wiki.archlinux.org/index.php/Haveged
   
   Whether this is useful on VMs, is contested, due to the way
   haveged works (reliance on rdtsc instructions which don't work
   well in VMs)
   http://security.stackexchange.com/questions/34523/is-it-appropriate-to-use-haveged-as-a-source-of-entropy-on-virtual-machines
   
 * Hardware RNG in Raspberry Pi:
   https://sites.google.com/site/astudyofentropy/project-definition/raspberry-pi-internal-hardware-random-number-generator
   
 * rng-tools - provides the rngd daemon to feed entropy from
   hardware RNGs into the Linux /dev/random pool
   https://wiki.archlinux.org/index.php/Rng-tools
   http://linux.die.net/man/8/rngd
msg268038 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-06-09 16:52
As with other proposals to add new APIs, I think this is an overreaction to a Linux specific problem. Linux system boot could deadlock with 3.5.0 and 3.5.1 due to:

- CPython startup using os.urandom() when it wasn't necessary
- systemd invoking a Python script before the OS entropy pool had been initialised (which then deadlocked until the Python invocation timed out)

As long as we switch the internal hash algorithm to seeding from a non-blocking random source, and also ensure that importing the random module doesn't implicitly call os.urandom, then any other software that only needs pseudorandom data can just use the random module APIs.
msg274711 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-09-07 02:35
PEP 524 has been implemented for 3.6b1 in #27776, so os.urandom() itself will now do the right thing for cryptographic use cases on Linux.
History
Date User Action Args
2016-09-07 02:35:31ncoghlansetstatus: open -> closed
type: enhancement
messages: + msg274711

resolution: rejected
stage: resolved
2016-06-12 11:34:38christian.heimessetnosy: - christian.heimes
2016-06-09 16:52:13ncoghlansetnosy: + ncoghlan
messages: + msg268038
2016-06-09 10:38:26lemburgsetmessages: + msg268014
2016-06-09 09:32:17larrysetmessages: + msg268010
2016-06-09 09:21:23lemburgsetmessages: + msg268005
2016-06-09 08:42:18christian.heimessetmessages: + msg267995
2016-06-09 08:30:30lemburgsetmessages: + msg267991
2016-06-09 08:12:03lemburgsetmessages: + msg267983
2016-06-09 08:07:00larrysetnosy: + larry
messages: + msg267978
2016-06-09 08:02:56christian.heimessetnosy: + christian.heimes
messages: + msg267977
2016-06-09 07:57:30lemburgsetmessages: + msg267972
2016-06-09 07:54:24lemburgcreate