This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Colm Buckley, Lukasa, Theodore Tso, alex, doko, dstufft, larry, lemburg, martin.panter, matejcik, ncoghlan, ned.deily, pitti, python-dev, rhettinger, skrah, thomas-petazzoni, vstinner, ztane
Priority: release blocker Keywords: patch

Created on 2016-04-24 19:04 by doko, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
nonblocking-getrandom.diff Colm Buckley, 2016-05-13 08:18 Patch to py_getrandom to use nonblocking system call, and associated plumbing.
getrandom-nonblocking-v2.patch Colm Buckley, 2016-05-14 01:51 Patch random.c to use nonblocking getrandom()
getrandom-nonblocking-v3.patch Colm Buckley, 2016-06-06 20:45 Patch random.c to use nonblocking getrandom() (cleaned-up version).
getrandom_nonblocking_v4.patch Colm Buckley, 2016-06-06 23:32
nonblocking_urandom_noraise.patch Colm Buckley, 2016-06-07 14:14
no-urandom-by-default.diff dstufft, 2016-06-07 18:29 review
Messages (172)
msg264121 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2016-04-24 19:04
[forwarded from https://bugs.debian.org/822431]

This regression / change of behaviour was seen between 20160330 and 20160417 on the 3.5 branch. The only check-in which could affect this is the fix for issue #26735.

3.5.1-11 = 20160330
3.5.1-12 = 20160417

Martin writes:
"""
I just debugged the adt-virt-qemu failure with python 3.5.1-11 and
tracked it down to python3.5 hanging for a long time when it gets
called before the kernel initializes its RNG (which can take a minute
in VMs which have low entropy sources).

With 3.5.1-10:

  $ strace -e getrandom python3 -c 'True'
  +++ exited with 0 +++

With -11:
  $ strace -e getrandom python3 -c 'True'
  getrandom("\300\0209\26&v\232\264\325\217\322\303:]\30\212Q\314\244\257t%\206\"", 24, 0) = 24
  +++ exited with 0 +++

When you do this with -11 right after booting a VM, the getrandom()
can block for a long time, until the kernel initializes its random
pool:

   11:21:36.118034 getrandom("/V#\200^O*HD+D_\32\345\223M\205a\336/\36x\335\246", 24, 0) = 24
   11:21:57.939999 ioctl(0, TCGETS, 0x7ffde1d152a0) = -1 ENOTTY (Inappropriate ioctl for device)

   [    1.549882] [TTM] Initializing DMA pool allocator
   [   39.586483] random: nonblocking pool is initialized

(Note the time stamps in the strace in the first paragraph)

This is really unfriendly -- it essentially means that you stop being
able to use python3 early in the boot process or even early after
booting. It would be better to initialize that random stuff lazily,
until/if things actually need it.

In the diff between -10 and -11 I do seem some getrandom() fixes to
supply the correct buffer size (but that should be irrelevant as in
-10 getrandom() wasn't called in the first place), and a new call
which should apply to Solaris only (#ifdef sun), so it's not entirely
clear where that comes from or how to work around it.

It's very likely that this is the same cause as for #821877, but the
description of that is both completely different and also very vague,
so I file this separately for now.
"""
msg264122 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2016-04-24 19:06
other issues fixed between these dates:

    - Issue #26659: Make the builtin slice type support cycle collection.
    - Issue #26718: super.__init__ no longer leaks memory if called multiple
      times.  NOTE: A direct call of super.__init__ is not endorsed!
    - Issue #25339: PYTHONIOENCODING now has priority over locale in setting
      the error handler for stdin and stdout.
    - Issue #26717: Stop encoding Latin-1-ized WSGI paths with UTF-8.
    - Issue #26735: Fix :func:`os.urandom` on Solaris 11.3 and newer when
      reading more than 1,024 bytes: call ``getrandom()`` multiple times with
      a limit of 1024 bytes per call.
    - Issue #16329: Add .webm to mimetypes.types_map.
    - Issue #13952: Add .csv to mimetypes.types_map.
    - Issue #26709: Fixed Y2038 problem in loading binary PLists.
    - Issue #23735: Handle terminal resizing with Readline 6.3+ by installing
      our own SIGWINCH handler.
    - Issue #26586: In http.server, respond with "413 Request header fields too
      large" if there are too many header fields to parse, rather than killing
      the connection and raising an unhandled exception.
    - Issue #22854: Change BufferedReader.writable() and
      BufferedWriter.readable() to always return False.
    - Issue #6953: Rework the Readline module documentation to group related
      functions together, and add more details such as what underlying Readline
      functions and variables are accessed.
msg264126 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-24 19:37
Python 3 uses os.urandom() at startup to randomize the hash function. os.urandom() now uses the new Linux getrandom() function which blocks until the Linux kernel is feeded with enough entropy. It's a deliberate choice.

The workaround is simple: set the PYTHONHASHSEED environment variable to use a fixed seed. For example, PYTHONHASHSEED=0 disables hash randomization.

If you use virtualization and Linux is not feeded with enough entropy, you have security issues.

> I just debugged the adt-virt-qemu failure (...)

If you use qemu, you can use virt-rng to provide good entropy to the VM from the host kernel.
msg264258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-26 11:47
See also the issue #25420 which is similar but specific to "import random".
msg264265 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-26 12:11
The issue #25420 has been closed as a duplicate of this issue.

Copy of the latest message:

msg264262 - (view) 	Author: Marc-Andre Lemburg (lemburg) * (Python committer) 	Date: 2016-04-26 12:05

I still believe the underlying system API use should be fixed rather than all the different instances where it gets used.

getrandom() should not block. If it does on a platform, that's a bug on that platform and Python should revert to the alternative of using /dev/urandom directly (or whatever other source of randomness is available).

Disabling hash randomization is not a good workaround for the issue, since it will definitely pop up in other areas as well.
msg264267 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-26 12:22
Hmm. Why does os.urandom(), which should explicitly not block, use a blocking getrandom() function?

This is quite unexpected on Linux.
msg264270 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-26 12:35
Wow, it's by design:

" os.urandom(n)

    Return a string of n random bytes suitable for cryptographic use."


``man urandom'':

"A read from the /dev/urandom device will not block waiting for more entropy.  As a result,  if  there  is
       not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a crypto-
       graphic attack on the algorithms used by the driver."
msg264271 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-26 12:37
"Hmm. Why does os.urandom(), which should explicitly not block, use a blocking getrandom() function? This is quite unexpected on Linux."

I modified os.getrandom() in the issue #22181 to use the new getrandom() syscall of Linux 3.17. The syscall blocks until the Linux kernel entropy pool is *initialized* with enough entropy. In a healthy system, it must never occur.

To be clear: you get read 10 MB (or 1 GB or more) of random data using os.urandom() even if the entropy pool is empty. You can test:

* In a terminal 1, run "dd if=/dev/random of=random" to ensure that the entropy pool is empty
* In a terminal 2, run "while true; do cat /proc/sys/kernel/random/entropy_avail; sleep 1; done" to see that entropy pool is empty (or very low, like less 100 bytes)
* In a terminal 3, get a lot of random data using os.urandom(): ./python -c 'import os; x=os.urandom(1024*1024*10)'

=> it works, you *can* get 10 MB of random data even if the kernel entropy pool is empty.

Reminder: getrandom() is used to avoid a file descriptor which caused various issues (see issue #22181 for more information).

Ok, now this issue. The lack of entropy is a known problem in virtual machine. It's common that SSH, HTTPS, or other operations block because because of the lack of entropy. On bare metal, the Linux entropy pool is feeded by physical events like interruptions, keyboard strokes, mouse moves, etc. On a virtual machine, there is *no* source of entropy.

The problem is not only known but also solved, at least for qemu: you must attach a virtio-rng device to your virtual machine. See for example https://fedoraproject.org/wiki/Features/Virtio_RNG The VM can now reads fresh and good quality entropy from the host.

To come back to Python: getrandom() syscall only blocks until the entropy pool is *initialized* with enough entropy.

The getrandom() syscall has a GRND_NONBLOCK to fail with EAGAIN if reading from /dev/random (not /dev/urandom) would block because the entropy pool has not enough entropy:
http://man7.org/linux/man-pages/man2/getrandom.2.html

IMHO it's a deliberate choice to block in getrandom() when reading /dev/urandom while the entropy pool is not initialized with enough entropy yet.

Ok, now the question is: should python do nothing to support VM badly configured (with no real source of entropy)?

It looks like the obvious change is to not use getrandom() but revert code to use a file descriptor and read from /dev/urandom. We will get bad entropy, but Python will be able to start.

I am not excited by this idea. The os.urandom() private file descriptor caused other kinds of issues and bad quality entropy is also an issue.
msg264284 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-26 13:44
It is clear how /dev/urandom works. I just think that securing enough
entropy on startup should be done by the init scripts (if systemd still
allows that :) and not by an application.

[Unless the application is gpg or similar.]
msg264289 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-26 14:14
Since many years, Linux systems store entropy on disk to quickly feed the
entropy pool at startup.

It doesn't create magically entropy on VM where you start with zero entropy
at the first boot.
msg264292 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-26 14:31
I did not claim that it magically creates entropy. -- Many VMs are throwaway test beds. It would be annoying to setup some entropy
gathering mechanism just so that Python starts.
msg264303 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-04-26 15:11
As mentioned on the other issue #25420, this is a regression and a change in documented behavior of os.urandom(), which is expected to be non-blocking, regardless of whether entropy is available or not.

The fix should be easy (from reading the man page http://man7.org/linux/man-pages/man2/getrandom.2.html): set the GRND_NONBLOCK flag on getrandom(); then, if the function returns -1 and sets EAGAIN, fallback to reading from /dev/urandom directly.
msg265427 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-12 21:18
It's worth noting that this issue now affects every installation of Debian testing track with systemd and systemd-cron installed; the python program /lib/systemd/system-generators/systemd-crontab-generator is called very early in the boot process; it imports hashlib (although only .md5() is used) and blocks on getrandom(), delaying boot time until a 90s timeout has occurred.

Suggestions: modify hashlib to avoid calling getrandom() until entropy is actually required, rather than on import; change the logic to use /dev/urandom (or an in-process PRNG) when getrandom() blocks; or both.
msg265430 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-12 21:48
Oh; it's not actually hashlib which is calling getrandom(), it's the main runtime - the initialization of the per-process secret hash seed in _PyRandom_Init

Don't know enough about the internal logic here to comment on what the Right Thing is; but I second the suggestion of msg264303. This might just require setting "flags" to GRND_NONBLOCK in py_getrandom() assuming that's portable to other OS.
msg265452 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-13 08:18
The attached patch (against 20160330) addresses the issue for me on Linux; it has not been tested on other platforms. It adds the GRND_NONBLOCK flag to the getrandom() call and sends the appropriate failure return if it returns due to lack of entropy. The enclosing functions fall back to reading from /dev/urandom in this case.

Affected files:

Python/random.c - changes to py_getrandom()
configure.ac and pyconfig.h.in - look for linux/random.h for inclusion

Can this, or something similar, be considered for integration with mainline?
msg265477 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-13 14:39
A couple of things to note:

* Despite the earlier title; this does not just apply to VMs; any system with a potentially-blocking getrandom() (including all Linux 3.17+ and Solaris 11+) is affected.

* It's true that getrandom() only blocks on Linux when called before the RNG entropy pool is initialized. However, Python should not be limited to only being called after this initialization.

* In particular, systemd-cron relies on a Python script being called very early in the boot process (before the urandom pool is initialized), this is now prevalent on the Debian testing track; causing a 90-second boot delay.

* The patch I supplied causes getrandom() to be only called in nonblocking mode; this seems consistent with the desired semantics of os.urandom and _PyRandomInit.

Hope this helps.

Colm
msg265481 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-13 16:31
@haypo - yes, it's strange that Linux's getrandom() might block even when reading the urandom pool. However, I think we need to just cope with this and add the GRND_NONBLOCK flag rather than attempting to force a change in the Linux kernel
msg265485 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-13 19:35
See https://lwn.net/Articles/606141/ for an explanation of the blocking behavior of getrandom(). This makes sense to me - before the pool has initialized, /dev/urandom will be readable but will return highly predictable data - ie: it should not be considered safe. In other words, I think that getrandom() offers a sensible API.

The only circumstances where we hit the EAGAIN in getrandom() should be when it's called extremely early in the boot process (as is the case for the systemd-cron generator script I mentioned earlier). I think this is safe enough; a more thorough approach would be to flag that the per-process hash seed (_Py_HashSecret) is predictable and shouldn't be used.
msg265496 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-13 23:24
Please elaborate the comment in the patch:

- explain that the RNG is not initialized yet with enough entropy
- add a referénce to this issue
- explain that it's a deliberate choice to use weak (non initialized) RNG
for practical reasons
msg265500 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-14 01:51
@haypo - new version of patch attached with comments and references per your request.
msg265549 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-14 22:49
getrandom-nonblocking-v2.patch:

+		/* Alternative might be to return all-zeroes as a strong
+		 * signal that these are not random data. */

I don't understand why you propose that in a comment of your change. I don't recall that this idea was proposed or discussed here.

IMHO it's a very bad idea to fill the buffer with zeros, the caller simply has no idea how to check the quality of the entropy. A buffer filled with zeros is "possible" even with high quality RNG, but it's really very very rare :-)

If you consider that a strong signal is required, you must raise an exception. But it looks like users don't care of the quality of the RNG, they request that Python "just works".
msg265555 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-14 23:09
@haypo - yes, I think you're right. Can you delete those two lines (or I can upload another version if you prefer).

I think the pragmatic thing here is to proceed by reading /dev/urandom (as we've discussed). It's not safe to raise an exception in py_getrandom from what I can see; a thorough effort to signal the lack of randomness to outer functions needs more code examination than I have time to carry out at the moment.

From looking at when PyRandom_Init is called and how the hash secret is used; I think it is safe to proceed with /dev/urandom. The general understanding is that urandom has a lower entropy quotient than random, so it's hopefully not going to be used in strong crypto contexts.
msg266216 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-05-24 02:04
@haypo - just wondering where things stand with this? Is this patch going to get pushed to the mainline?
msg267455 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-06-05 18:43
Since 3.5.2 is almost upon us, I'm setting this to "release blocker" status so we can make a decision about whether this should be changed for 3.5.2 or not.  @haypo, do you have an opinion about the patch?
msg267504 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-06 02:29
Minor thing: the patch has tabbed intentation in places rather than spaces.

As I understand it, if there is no entropy initialized, this patch will fall back to reading /dev/urandom, which will return predictable data (opposite of “random” data!). But since we take this non-strict fallback in other cases (e.g. no OS support), there is a decent argument for also taking the predictable fallback path when entropy is uninitialized.
msg267511 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-06 02:39
Maybe an alternative would be to add a special PYTHONHASHSEED=best-effort (or whatever) value that says if there is no entropy available, use a predictable hash seed. That would force whoever starts the Python process to be aware of the problem.
msg267537 - (view) Author: Antti Haapala (ztane) * Date: 2016-06-06 15:52
I don't think setting environment variables is a solution, as it is not always clear which script occurs early in the boot process, or even that which program has components written in Python. However I'd want to be notified of failure as well, perhaps a warning should be emitted.
msg267539 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-06-06 16:13
I think such warnings should be emitted at application level, similar to the case when a program refuses to run under UID 0.

If admins wish, they can also integrate such checks into the system startup sequence (e.g. runlevel 3 is only reached if randomness is actually available).
msg267546 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-06 20:24
Speaking as the 3.5 RM, I suppose I have to have an opinion.  I don't think "Python now uses a better source of randomness to seed the random module at startup" is a major feature.  It's a nice-to-have, not a must-have.  And people who care about good randomness (e.g. people doing crypto with the random module) shouldn't be relying on the freebie initialization they get just by importing.

So I think changing the default is fine, especially if the new default is "seed from the entropy pool, but if it's empty failover to the not-as-good source of random bits".  If you think that's a bad move, please add your comments here--I'm willing to have my mind changed about this.

I'll remind you: the schedule says I tag 3.5.2 RC 1 this coming Saturday (almost exactly six days from now).  Naturally I'd prefer to make the release on time.
msg267550 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-06 20:45
@larry -

Thank you for joining in. I'm uploading a third version of the patch (against clean 3.5.1 source, with correct whitespace and a less confusing comment) which implements the following:

* configure.ac / pyconfig.h.in : looks for linux/random.h and sets HAVE_LINUX_RANDOM_H if present.

* random.c : calls getrandom() with the GRND_NONBLOCK flag; if that fails, fall back to reading /dev/urandom which will have insufficient entropy but will at least return some data.

I feel that there is no consistent way to signal to higher-level applications that the random data has sub-standard entropy; but that this at least preserves the expected semantics, and doesn't block on startup in the event of an uninitialized entropy pool.
msg267554 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-06 20:50
>  I'm uploading a third version of the patch (against clean 3.5.1 source

Not against the 3.5 branch from hg.python.org/cpython ?  If not, why not?
msg267571 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-06 23:32
@larry

Short version; I'm not set up on HG and don't have enough time to get there from here. The patch I submitted applies cleanly to the HG tip as of 15 minutes ago (rev 101768) with only offset changes; the attached v4 version includes the necessary offset changes.
msg267608 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-07 09:27
New changeset 9de508dc4837 by Victor Stinner in branch '3.5':
os.urandom() doesn't block on Linux anymore
https://hg.python.org/cpython/rev/9de508dc4837
msg267609 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 09:39
Sorry for the delay.

getrandom_nonblocking_v4.patch LGTM, but I made a major change: if getrandom() fails with EAGAIN, it now *always* fall back on reading /dev/urandom.

I also documented the change in os.urandom() documentation and in Misc/NEWS.

I pushed the fix to Python 3.5 and 3.6. Python 2.7 doesn't use getrandom() and so doesn't need the fix. I now consider that the bug is fixed.

If you consider that it's important enough to retry calling getrandom() each time os.urandom() is called, please open a new issue with a patch and elaborate your rationale :-)
msg267610 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 09:40
Manual check to ensure that getrandom() syscall is used on Linux:

$ strace -o trace ./python -c 'import os; os.urandom(16); os.urandom(16)' && grep getrandom trace 
getrandom("...", 24, GRND_NONBLOCK) = 24
getrandom("...", 16, GRND_NONBLOCK) = 16
getrandom("...", 16, GRND_NONBLOCK) = 16

The first read of 24 bytes is to initialize the randomized hash function.
msg267611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 09:55
I'm the author of the os.urandom() change which introduced the usage of the new getrandom() syscall: see the issue #22181. My motivation was to avoid the internal file descriptor to read /dev/urandom. In some corner cases (issue #18756), creating a file descriptor fails with EMFILE. Python introduced a workaround keeping the file descriptor open (issue #18756), but this change introduced new issues (issue #21207)...

When I modified os.urandom(), I was aware that getrandom() can block at startup, but I saw this feature as a good thing for Python. It doesn't seem like a good idea to generate low quality random numbers. I expected that such system *quickly* gets enough good entropy. With this issue, we now have more information: "quickly" means in fact longer than 1 minute! ("causing a 90-second boot delay", msg265477).

Blocking Python startup longer than 1 minute just to get good quality random numbers doesn't seem worth it to me. It is clearly seen as a regression compared to Python 2 (which doesn't use getrandom() but reads /dev/urandom). I understand that the behaviour is also seen as a bug when compared to other programming languages or applications.

For all these reasons, I pushed Colm's change.
msg267612 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 10:01
Colm Buckley: "I feel that there is no consistent way to signal to higher-level applications that the random data has sub-standard entropy; but that this at least preserves the expected semantics, and doesn't block on startup in the event of an uninitialized entropy pool."

I chose to document the behaviour of os.urandom().


Stefan Krah (msg267539): "If admins wish, they can also integrate such checks into the system startup sequence (e.g. runlevel 3 is only reached if randomness is actually available)."

Maybe need something like time.get_clock_info(), sys.float_info and sys.thread_info for os.urandom(): a string describing the implementation of os.urandom(). It would allow the developer to decide what to do when getrandom() is not used.

Reminder: getrandom() feature is specific to Linux. I understand that all other operating systems don't warn if the urandom entropy pool is not initialized yet!
msg267614 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 10:09
@haypo - I concur with all of your comments. I didn't have a strong opinion on whether to modify getrandom_works; your proposal looks fine to me (and will give consistent behavior over the lifetime of the process).

Thanks all for your help with this issue; much appreciated.
msg267616 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 10:14
Martin Panter (msg267504): "As I understand it, if there is no entropy initialized, this patch will fall back to reading /dev/urandom, which will return predictable data (opposite of “random” data!)."

No, I don't think so.

Linux uses a lot of random sources, but some of them are seen as untrusted as so are added with a very low estimation of their entropy. Linux even adds some random values with a estimation of 0 bit of entropy. For example, drivers can add serial numbers as random numbers.

So even if getrandom() blocks, if the urandom entropy pool is not considered as fully initialized yet, I expect that /dev/urandom still generates *random* numbers, even if these numbers are not suitable to generate cryptographic keys.

Please double check, I'm not sure of what I wrote :-)

See also http://www.2uo.de/myths-about-urandom/ (but this article doesn't describe how urandom is initialized).
msg267617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 10:15
Martin Panter (msg267511): "Maybe an alternative would be to add a special PYTHONHASHSEED=best-effort (or whatever) value that says if there is no entropy available, use a predictable hash seed. That would force whoever starts the Python process to be aware of the problem."

In my experience, it's better if users don't touch security :-) It's better if Python simply makes the best choices regarding to security.
msg267621 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-06-07 10:44
On Tue, Jun 07, 2016 at 10:01:16AM +0000, STINNER Victor wrote:
> Maybe need something like time.get_clock_info(), sys.float_info and sys.thread_info for os.urandom(): a string describing the implementation of os.urandom(). It would allow the developer to decide what to do when getrandom() is not used.

I think this is a very good idea. Just a flag for "cryptographically secure" or not.
msg267623 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2016-06-07 11:27
This doesn't look correct to me. Despite what the Linux maintainers insist, it's a _bug_ that /dev/urandom will return immediately if the system's entropy pool has never been seeded; one of the whole points of the getrandom syscall is that it has the correct behavior (which is the same behavior as BSDs).

IMO the patch landed this morning should be reverted and it should be left as is.
msg267624 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 11:31
On 07.06.2016 13:27, Alex Gaynor wrote:
> 
> This doesn't look correct to me. Despite what the Linux maintainers insist, it's a _bug_ that /dev/urandom will return immediately if the system's entropy pool has never been seeded; one of the whole points of the getrandom syscall is that it has the correct behavior (which is the same behavior as BSDs).
> 
> IMO the patch landed this morning should be reverted and it should be left as is.

I'm with Victor and the others on this one. Practicality beats
purity.
msg267625 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 11:32
Alex: "IMO the patch landed this morning should be reverted and it should be left as is."

Sorry, but you should elaborate a little bit more, see my rationale:
https://bugs.python.org/issue26839#msg267611

There are multiple issues.
msg267626 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 11:34
Stefan Krah: "I think this is a very good idea. Just a flag for "cryptographically secure" or not."

If you consider it is worth it, please open a new issue.

I dislike the idea of a boolean. The quality of each system RNG has been discussed long enough to be able to say that "cryptographically secure" term depends a lot of your use case, and experts disagree between themself :-) You might even have to consider the version of the Linux kernel to decide if /dev/urandom is good enough or not for you use case. The implementation changed last years.
msg267627 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-07 11:35
This patch explicitly violates several of the documented constraints of the Python standard library.

For example, random.SystemRandom uses os.urandom to generate its random numbers. SystemRandom is then used by the secrets module to generate *its* random numbers. This means that os.urandom *is* explicitly used by the Python standard library to generate cryptographically secure random numbers. It was done so in part expressly because the call to random() could block.

If Python needs a non-blocking RNG for internal purposes, that's totally fine, a new function should be written that does exactly that. But any code that is calling secrets or random.SystemRandom is expecting the documented guarantees of that module: that is, that the security profile of the random numbers generated by those objects are cryptographically secure. This patch ensures that that guarantee is *violated* on Linux systems run on cloud servers, which is more than a little alarming to me.
msg267628 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 11:36
I agree with Alex here.

The documentation of ``os.urandom`` states: Return a string of n random bytes suitable for cryptographic use. However the old behavior prior to using the ``getrandom()`` call and the behavior with this patch makes that documentation a lie. It's now a string of n random bytes that may or may not be suitable for cryptographic use, but we have no idea which one it is.

No where in the documentation of ``os.urandom`` does it ever promise it will not block. In fact, on systems like FreeBSD where their /dev/urandom is better than Linuxes it always blocked on start up because that's just the way their /dev/urandom works.
msg267629 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 11:39
I also agree with Cory :) If CPython needs a non blocking RNG for start up, then it should add a new function that does that, breaking the promise of ``os.urandom`` cryptographically suitable random is not the way to do that.
msg267630 - (view) Author: Thomas Petazzoni (thomas-petazzoni) Date: 2016-06-07 11:41
The original problem is that Python wants to generate random numbers at *startup*. Are those random numbers really used for crypto-related activities? I doubt it.

So isn't the proper solution to have two functions, one delivering random numbers that are usable for crypto-related activities, and which would potentially block, and a second one that delivers random numbers that are not appropriate for crypto stuff. This second function can be used at Python startup to replace what is done currently.

It is most likely perfectly fine if Python blocks when explicitly asked to generate cryptographically secure random numbers. But not when simply starting the interpreter.
msg267631 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 11:45
On 07.06.2016 13:36, Donald Stufft wrote:
> No where in the documentation of ``os.urandom`` does it ever promise it will not block. In fact, on systems like FreeBSD where their /dev/urandom is better than Linuxes it always blocked on start up because that's just the way their /dev/urandom works.

The whole purpose of urandom is to have a non-blocking source of
random numbers:

http://man7.org/linux/man-pages/man4/random.4.html

and os.urandom() has always stated: "This function returns random bytes
from an OS-specific randomness source. The returned data should be
unpredictable enough for cryptographic applications, though its exact
quality depends on the OS implementation."

That's pretty much in line with what the implementation now
does. There's no promise on the quality of the data it returns.
msg267632 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 11:51
> That's pretty much in line with what the implementation now does.

Literally the first line of the os.urandom documentation is "Return a string of n random bytes suitable for cryptographic use.". There is absolutely a promise that, as long as your OS isn't broken, this will provide cryptographically safe random numbers. As Cory pointed out, random.SystemRandom and the new secrets module are both relying on this promise of cryptographically safe numbers to provide their functionality, as is a number of other, external Python programs.

This patch is a regression in the safety of this function, flat out, no way around it.

Modern *nix's other than Linux have all already made /dev/urandom blocking on start up until it's been intialized. The only reason Linux hasn't is because Ted T'so has bad opinions, but that doesn't change the fact that people should always use urandom, and you should block until it's been initialized.

I fail to understand why, if the CPython start up needs non blocking random to the point it would rather have cryptographically unsafe random than block, why a function that does that shouldn't be added instead of causing every other use of ``os.urandom`` to be potentially unsafe.
msg267633 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 11:53
Cory Benfield: "For example, random.SystemRandom uses os.urandom to
generate its random numbers. SystemRandom is then used by the secrets
module to generate *its* random numbers. This means that os.urandom
*is* explicitly used by the Python standard library to generate
cryptographically secure random numbers. It was done so in part
expressly because the call to random() could block."

IMHO you should read http://www.2uo.de/myths-about-urandom/ which
explains that the property of blocking or not blocking doesn't matter
for the quality of the RNG. /dev/urandom is good enough to generate
crytographic keys. Can we please stay focused on the *uninitialized
entropy pool* case?

Please see my message:
https://bugs.python.org/issue26839#msg267612
"Reminder: getrandom() feature is specific to Linux. I understand that
all other operating systems don't warn if the urandom entropy pool is
not initialized yet!"

IMHO you are expecting too much from os.urandom(). *If* you consider
that secrets require an initialized entropy pool, IMHO you should help
Stephan to implement a function to retrieve the implementation of
os.urandom() and then take a decision *in the secrets module*. For
example, raise an exception. It's the best way to warn users that
something goes wrong. I don't think that *blocking* is a good choice.
msg267634 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 11:54
> Reminder: getrandom() feature is specific to Linux. I understand that all other operating systems don't warn if the urandom entropy pool is not initialized yet!

As far as I know, all other modern OSs *ALWAYS* block until their entropy pool is intialized. It's Linux that refuses to get with the program.
msg267635 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 12:00
> IMHO you should read http://www.2uo.de/myths-about-urandom/ which explains that the property of blocking or not blocking doesn't matter for the quality of the RNG. /dev/urandom is good enough to generate crytographic keys. Can we please stay focused on the *uninitialized entropy pool* case?

Cory wasn't speaking about (non)blocking in general, but the case where (apparently) it's desired to not block even if that means you don't get cryptographically secure random in the CPython interpreter start up. Nobody here wants ``os.urandom`` to behave like ``/dev/random`` does on Linux. We just want ``os.urandom`` to always return cryptographically safe random numbers.

> IMHO you are expecting too much from os.urandom(). *If* you consider that secrets require an initialized entropy pool, IMHO you should help Stephan to implement a function to retrieve the implementation of os.urandom() and then take a decision *in the secrets module*. For example, raise an exception. It's the best way to warn users that something goes wrong. I don't think that *blocking* is a good choice.

I think this is a pretty crappy way of handling it-- blocking for a short amount of time is almost always going to be the right thing for people here, particularly since it only matters right at the start of a fresh VM and no other time. I think that it's wrong to let an edge case of PYTHONHASHSEED reduce the security and the ability to reason about the return value of os.urandom for basically every other application of it.
msg267636 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 12:02
Donald Stufft: "As far as I know, all other modern OSs *ALWAYS* block
until their entropy pool is intialized. It's Linux that refuses to get
with the program."

Ah? I didn't know. Anyway, it doesn't change anything to the problem.

I don't think that security matters enough to block Python at startup.
Python has a long history of being a thin wrapper on top of the OS.
Usually, Python doesn't workaround design issues of OSes, but expose
functions as they are.

If you think that Linux is broken, please fix Linux, not Python.

--

If security matters in your application, you should works around the
Linux behaviour (bug?) in your application, but not in Python. For
example, raise a fatal error with an error written in capital letters.
Or block. Python *cannot* make this choice for you. It's part of
Python design to not take such decision for you.

Python is used in various areas, and in many areas, security don't
matter at all.

To me, it's just a major bug that python3 -c 'print("Hello World")
blocks until Linux has enough entropy. In some embedded devices, you
can wait forever, you will *never* get enough entropy to see the hello
world message...

--

Trying to decide if os.urandom() and /dev/urnadom are "secure" or not
is a waste of time. To me it's now clear that it's impossible to
decide :-) It depends on your expectation from security. Don't start
to loose time on discussion this forever ;-)
msg267637 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-07 12:04
Victor Stinner: I found that comment to be pretty patronising, but I'm assuming that wasn't the intent. However, your characterisation of my comment was not as I intended it: when I said "because it can block", I meant because on almost every system urandom will block if there is insufficient randomness to seed the kernel CSPRNG.

On FreeBSD, /dev/urandom blocks on startup until sufficiently seeded. On OS X, /dev/urandom behaves exactly the same as /dev/random (from the man page: "the two devices behave identically"), which is to say it blocks until the CSPRNG is sufficiently seeded. On Windows, CryptGenRandom (used by this code) specifies no blocking guarantees and the opinion of the wider community is that it too will block until sufficient entropy is gathered from startup.

So, let me say this: if the purpose of this patch was to prevent long startup delays, *it failed*. On all the systems above os.urandom may continue to block system startup. If the purpose of this patch is to prevent the system blocking at startup then you *must not use urandom at Python interpreter startup*.

This is why I object to this patch: it weakens the Linux interpreter while not fixing the actual problem. If Python does not need a CSPRNG at startup, then it should not block waiting for one *on any system*. If it does need a CSPRNG, then it should block until seeded. I don't see why some weird in-between solution is good enough here.
msg267638 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2016-06-07 12:05
Repeating what a few other folks have said: the of os.urandom's callers shouldn't have to pay for the hash seed implementation. If Python internally is ok with suboptimal entropy, it should use a different function. Or early-boot Python users should set PYTHONHASHSEED.
msg267640 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 12:05
> I don't think that security matters enough to block Python at startup.
> Python has a long history of being a thin wrapper on top of the OS.
> Usually, Python doesn't workaround design issues of OSes, but expose
> functions as they are.

That's fine, so make a new function that will return "maybe random data maybe not, who knows" instead of taking the function for producing cryptographically secure random data and making it less suitable for that task. This is the problem, not that Python start up is blocking, but that this patch takes that edge case, and declares that it's behavior is the correct behavior for everyone trying to get cryptographically secure random numbers.
msg267642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 12:06
Since many people seem to disagree, I reopen the issue, even if I still consider it as fixed ;-)

I also opened the issue #27249: "Add os.urandom_info".
msg267643 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 12:09
To be clear, I don't have a problem adding ``os.urandom_info`` but I don't think that it solves the problem, we shouldn't force people to introspect ``os.urandom`` to figure out if CPython decided to make it less secure on this invocation or not.
msg267644 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 12:18
On 07.06.2016 13:51, Donald Stufft wrote:
> 
> Donald Stufft added the comment:
> 
>> That's pretty much in line with what the implementation now does.
> 
> Literally the first line of the os.urandom documentation is "Return a string of n random bytes suitable for cryptographic use.". There is absolutely a promise that, as long as your OS isn't broken, this will provide cryptographically safe random numbers. As Cory pointed out, random.SystemRandom and the new secrets module are both relying on this promise of cryptographically safe numbers to provide their functionality, as is a number of other, external Python programs.

Ah, that's what you call taking quotes out of context :-) The full
documentation reads:

"""
Return a string of n random bytes suitable for cryptographic use.

This function returns random bytes from an OS-specific randomness
source. The returned data should be unpredictable enough for
cryptographic applications, though its exact quality depends on the OS
implementation.

On Linux, getrandom() syscall is used if available and the urandom
entropy pool is initialized (getrandom() does not block). On a Unix-like
system this will query /dev/urandom.
"""
https://docs.python.org/3.5/library/os.html?highlight=urandom#os.urandom

Note how the documentation emphasizes on os.urandom() not blocking.

I like the idea that Victor brought to allow applications to
check whether os.urandom() reverted to non-blocking /dev/urandom
or not. That way applications can make the right choices, while
still assuring that Python doesn't block on startup just to
init hash randomization (which has it's own set of issues).

BTW: /dev/urandom doesn't make many promises as to the quality
of the data on Linux. For crypto applications relying on real
entropy, it's better to gather data from a hardware source
with known properties, e.g.
http://fios.sector16.net/hardware-rng-on-raspberry-pi/, not on
/dev/random or /dev/urandom:
https://www.schneier.com/blog/archives/2013/10/insecurities_in.html
msg267645 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 12:19
> Note how the documentation emphasizes on os.urandom() not blocking.

That line about not-blocking was added by the patch that Victor committed that we're objecting to.
msg267648 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 12:24
Thomas Petazzoni: "The original problem is that Python wants to generate random numbers at *startup*. Are those random numbers really used for crypto-related activities? I doubt it."

Python randomized hash function and random.Random (Mersenne Twister, instanciated when "import random" is called) don't need high quality random. Poor entropy is enough ;-)

Thomas Petazzoni: "So isn't the proper solution to have two functions, one delivering random numbers that are usable for crypto-related activities, and which would potentially block, and a second one that delivers random numbers that are not appropriate for crypto stuff. This second function can be used at Python startup to replace what is done currently."

Sure, that's the obvious change: I proposed the issue #27250.

I forgot about the new secrets module. I agree that *this* module must require high-quality entropy.
msg267650 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 12:25
(Oh, in fact I wanted to reopen the issue, so Status must be changed, not Resolution, so the issue is visible again in the main list of issues.)
msg267654 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 12:36
On 07.06.2016 14:19, Donald Stufft wrote:
> 
>> Note how the documentation emphasizes on os.urandom() not blocking.
> 
> That line about not-blocking was added by the patch that Victor committed that we're objecting to.

Ah, sorry, I was looking at the online docs and the selector was
set to 3.5.1, so I was under the impression of looking at the 3.5.1
docs, not a later version.

In any case, the point still stands: os.urandom() has always
been documented as interface to /dev/urandom on Linux, which again
is defined as non-blocking interface. The change in 3.5.0 to
use getrandom() broke this and Victor's patch restored the previously
documented and expected behavior, so I don't see what the problem is.

People looking for a blocking random number source should use
/dev/random (or better: a hardware entropy device). Even with
initialized entropy pool, /dev/urandom will happily return
pseudo random numbers if the entropy pool lacks entropy, so
you don't really win much:

"""
       A read from the /dev/urandom device will not block waiting for more
       entropy.  If there is not sufficient entropy, a pseudorandom number
       generator is used to create the requested bytes.
"""
http://man7.org/linux/man-pages/man4/random.4.html

It's simply the wrong source to use for crypto random data,
since you can never be sure that the data returned by
/dev/urandom originates from true entropy or some
approximation.

In that light, using it to seed hash randomization is the
right approach, since you don't need a crypto RNG to seed
this part of the Python runtime.
msg267656 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 12:40
(Basically) nobody should ever use /dev/random (and cryptographers agree!). The thing you want to use is /dev/urandom and the fact that /dev/urandom on Linux doesn't block before the pool is initalized has long been considered by cryptographers to be a fairly large flaw. The ``getrandom()`` calls were added explicitly to allow programs to get the correct behavior out of the system random.

For more information see http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ or http://www.2uo.de/myths-about-urandom/. The /dev/urandom man page is wrong, and it's wrong for political reasons and because Ted T'so has bad opinions.
msg267660 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 12:52
Currently, os.urandom() doesn't block anymore which means secrets should be updated. If we revert os.urandom(), Python must be patched to use a non-blocking urandom to initialized hash secret and random.Random (when the random module is imported).

In both cases, something should be changed. I suggest to move the discussion to the issue #27250 to try to identify which parts of Python requires secure RNG, which parts of Python don't require a secure RNG, and how to expose secure and not secure RNG in Python.
msg267661 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 13:01
On 07.06.2016 14:40, Donald Stufft wrote:
> 
> Donald Stufft added the comment:
> 
> (Basically) nobody should ever use /dev/random (and cryptographers agree!). The thing you want to use is /dev/urandom and the fact that /dev/urandom on Linux doesn't block before the pool is initalized has long been considered by cryptographers to be a fairly large flaw. The ``getrandom()`` calls were added explicitly to allow programs to get the correct behavior out of the system random.

Sounds to me that what you really want is os.getrandom() and not
a change in the implementation of os.urandom().

I think that would be a better solution overall: we get os.getrandom()
with access to all options and have os.urandom be the non-blocking
interface to /dev/urandom it has always been.

> For more information see http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ or http://www.2uo.de/myths-about-urandom/. The /dev/urandom man page is wrong, and it's wrong for political reasons and because Ted T'so has bad opinions.

I'm not sure what you are trying to tell me with those blog
posts or comments. The concept of trying to measure entropy
in an entropy pool is certainly something that people can have
different opinions about, but it's not wrong per-se when you
don't have easy access to a hardware device providing truely
random data (as in the Raspi SoC).

IMO, blocking is never a good strategy, since it doesn't increase
security - in fact, it lowers it because it opens up a denial
of service attack vector. Raising an exception is or providing other
ways of letting the application decide.
msg267663 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:07
What I'm trying to tell you is that /dev/random is a bad implementation and practically every cryptographer agrees that everyone should use /dev/urandom and they all also agree that on Linux /dev/urandom has a bad wart of giving bad randomness at the start of the system. The behavior of getrandom is a fix to that. In addition, almost nobody needs hardware RNG, /dev/urandom (minus the intialization problem on Linux) is the right answer for almost every single application (and if it's not the right answer, you're a cryptographer who knows that it's not the right answer). On most systems, /dev/random and /dev/urandom have the exact same behavior (which is the behavior of getrandom()-- blocks on intialization, otherwise doens't), it's just linux being brain dead here.
msg267664 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:12
Since there's obviously contention about what the right answer here is, I suggest we should revert the patch (since the old behavior already exists in 3.5 and is shipped to thousands of people already, and status quo wins) and then continue the discussion about what to do further beyond that. At the very least, if something isn't decided prior to Larry cutting a release, then it should be reverted then.
msg267665 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 13:16
On 07.06.2016 15:12, Donald Stufft wrote:
> 
> Since there's obviously contention about what the right answer here is, I suggest we should revert the patch (since the old behavior already exists in 3.5 and is shipped to thousands of people already, and status quo wins) and then continue the discussion about what to do further beyond that. At the very least, if something isn't decided prior to Larry cutting a release, then it should be reverted then.

Wait. Under that argument, every regression we introduce
would be deemed fine and not bug, because the "status quo
wins". I'm sorry, but that's non sense.

Python 3.5 introduced a regression w/r to the behavior of
os.urandom() compared to Python 3.4 and older releases.

If someone wants getrandom() behavior, we should add a new
API for this and fix the regression in os.urandom().
msg267666 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:21
The patch causes a regression because I'm relying on the 3.5 behavior of getting secure randomness from ``os.urandom`` via the ``getrandom()`` system call (behavior that was documented in the Whats New in 3.5). The 3.5 behavior also makes ``os.urandom`` behave the same on Windows, FreeBSD, OpenBSD, etc, basically every major OS except for Linux.

And yes, it's not unusual for "bugs" to not be fixed if there is contention about whether or not they are bugs at all and if they should be fixed. The typical resolution path to not change anything unless there's broad agreement, if that can't happen on bugs.p.o then escalate to python-dev, and if it can't happen there then escalate to a PEP for a BDFL pronouncement.
msg267667 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 13:33
Here's where things stand, as I currently see it.

Under certain circumstances, under Linux at least, calls to getrandom() will block. Specifically, they will block if the system's PRNG has been insufficiently initialized. Depending on the nature of the host, this block can be for a long time - delays of over 90 seconds have been observed.

This does not just affect VMs; physical systems are also affected.

Even if an application does not explicitly request random data, Python calls this function early in its execution to initialize the per-process hash seed (in _PyRandom_Init()).

The net effect is that *all* invocations of Python will block at startup if the system RNG is blocking. The only reason this is being called out as Linux-specific is that the behavior has been noticed in Linux.

I posit that it's *highly undesirable* to have Python block on the system RNG even when called in non-crypto contexts. (The specific trigger for this bug is the current behavior of Debian's 'testing' branch which calls a Python script very shortly after boot to parse crontab entries.)

Regardless of what behavior is expected of module functions such as os.urandom(), a blocking RNG must not be used to initialize _Py_HashSecret, even at the expense of predictability.

I agree that ultimately a variety of interfaces should be exposed, to allow developers to choose sensible compromises between performance and security. However, the current patchset makes the minimal change to system behavior while allowing a practical path forward.

My personal preference would be for os.urandom(n) to favor non-blocking operation over cryptographic security, and either add os.random() or add an optional parameter to os.urandom() to make the opposite trade-off.

Given the deadline requested by Larry for 3.5.2, however, can I suggest that the minimal solution is the one already proposed by myself and Victor?

(Aside: "you should fix Linux" is not really within the realm of practicality.)
msg267668 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 13:36
Donald: please note - the current behavior is that Python *will not start at all* if getrandom() blocks (because the hash secret initialization fails). If you are relying on the current behavior with a functioning application, then you are invoking it well after the system PRNG has been initialized, and the situation doesn't apply.
msg267669 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:40
> My personal preference would be for os.urandom(n) to favor non-blocking operation over cryptographic security, and either add os.random() or add an optional parameter to os.urandom() to make the opposite trade-off.

Insecure by default is very rarely the right trade off. There are thousands of projects using ``os.urandom()`` already assuming it's going to give them cryptographically strong numbers. If we want a "maybe random" function or option, then it should be the new thing, not the other way around.

I have no problem with SipHash using a possibly insecure random so that Python can start up quickly even in the face of an unitialized urandom on Linux. I do have a problem with infecting every single call to os.urandom with that same choice.

> The current behavior is that Python *will not start at all* if getrandom() blocks (because the hash secret initialization fails).

It starts jsut fine, it just can possible takes awhile.
msg267670 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:43
> The net effect is that *all* invocations of Python will block at startup if the system RNG is blocking. The only reason this is being called out as Linux-specific is that the behavior has been noticed in Linux.

The fix is Linux specific, because all other modern OSs don't allow you to read randomness from /dev/urandom until the bool is initialized. So the problem that this ticket has exists on all platforms *by design* because every modern platform but Linux has a good implementation of /dev/urandom with regards to the behavior prior to the pool being fully initialized. IOW this patch on OpenBSD (where this syscall also exists) will cause us to try getrandom(), fail because it would have to block, then open up /dev/urandom and then.. block again because OpenBSD made reasonable choices and doesn't allow people to read bad random off it's random device.
msg267671 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 13:49
Donald -

With the greatest respect, you're talking about introducing multi-minute delays into the startup times of hundreds of millions of systems, regardless of whether they have a proximate requirement for cryptographically-secure RNG sources. I don't think that's reasonable. My servers start up in about fifteen seconds with this patch applied, or over two minutes without.

Note; it's perfectly possible for getrandom() to block *indefinitely* - in the trigger case here (systemd's crontab generator), it times out after 90 seconds rather than eventually succeeding. If (for example), a Python script is called before device initialization, it's quite possible that there will *never* be enough entropy in the system to satisfy getrandom(), resulting in a non-booting system.

To reiterate; the overwhelming majority of applications (in particular, anything which is called after the entropy pool is initialized, which typically happens once networking, USB etc. are running) will use perfectly acceptable random sources. The only applications affected by this patch are those which call getrandom() very early in the boot process.

I feel you're tilting at a very impractical windmill.
msg267672 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2016-06-07 13:51
Colm -- how is that situation not addressed by fixing the hash seed generation specifically, rather than patching all consumers of os.urandom?
msg267673 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 13:51
> With the greatest respect, you're talking about introducing multi-minute delays into the startup times of hundreds of millions of systems, regardless of whether they have a proximate requirement for cryptographically-secure RNG sources.

No I'm not. I'm talking about implementing this fix so that it *only* affects the Python interpreter start up, in particular things like SipHash initialization, instead of affecting the security of every other user of os.urandom.
msg267674 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 13:58
I have no objection to *deliberate* invocations of the system RNG blocking if needed. Presumably this behavior can be codified into the various APIs.

My objection is *entirely* to _PyRandom_Init() calling a potentially-blocking RNG source, before script parsing even begins. This basically prohibits Python from starting on systems where the system RNG is blocking. Linux is the only affected system *now* because the systemd crontab generator is the only Python script called before the RNG has initialized. Exactly the same issue would apply to any of the BSDs or Solaris, if /dev/urandom blocks as you describe. I want to be clear - this is not a Linux-specific issue.
msg267675 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 14:06
On 07.06.2016 15:21, Donald Stufft wrote:
> 
> The patch causes a regression because I'm relying on the 3.5 behavior of getting secure randomness from ``os.urandom`` via the ``getrandom()`` system call (behavior that was documented in the Whats New in 3.5). The 3.5 behavior also makes ``os.urandom`` behave the same on Windows, FreeBSD, OpenBSD, etc, basically every major OS except for Linux.

The contention is not about using getrandom() to fetch data,
but the newly introduced and unwanted blocking nature during
system startup.

This was not documented anywhere and it's
a regression that causes major problems with using Python 3.5
on containers and VMs systems where startup time is of
essence.

You get the same blocking behavior when importing the random
module (see http://bugs.python.org/issue25420), even though it's
just seeding the global PRNG.

All these instances assume that os.urandom() does *not* block
and rightly so, since at the time they were written, os.urandom()
was a direct interface to /dev/urandom, which is documented
and designed to not block on Linux.

> And yes, it's not unusual for "bugs" to not be fixed if there is contention about whether or not they are bugs at all and if they should be fixed. The typical resolution path to not change anything unless there's broad agreement, if that can't happen on bugs.p.o then escalate to python-dev, and if it can't happen there then escalate to a PEP for a BDFL pronouncement.

Sure, we can have the discussion again on python-dev. I just
don't understand why you are apparently not willing to even
consider compromises.
msg267676 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 14:09
> I have no objection to *deliberate* invocations of the system RNG blocking if needed. Presumably this behavior can be codified into the various APIs.
>
> My objection is *entirely* to _PyRandom_Init() calling a potentially-blocking RNG source, before script parsing even begins.

It sounds like we might (somehwat?) be in violent agreement then.

If someone calls os.urandom() (or calls something that causes it to be called, e.g. secrets.py, random.SystemRandom, etc) then they should not get randomness from an un-initialized /dev/urandom by default. I have a preference for blocking until randomness is available, but an exception would be OK too.

I have no problem with the interpreter start up not blocking on entropy because no user invoked code caused that, and the security properties of SipHash that need really good random only really matter for long lived processes that processes a lot of user input-- IOW stuff that's unlikely to be started prior to the pool being initialized.
msg267677 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 14:09
I'm with Donald here. Python must not reduce security just for a special case. It doesn't mean that we should not address and fix this special case -- just treat it as special.

1) For your use case, the hash randomization key for the SipHash PRN doesn't need to be 4 or 8 bytes of CPRNG. Since you are not dealing with lots of untrusted input from a malicious remote source, any unpredictable or even predictable value will do.

2) Your use case might be special enough to use a special build of Python. Too bad https://www.python.org/dev/peps/pep-0432/ is not ready yet. 

3) #21470 causes 'import random' to read os.urandom(2500) in order to initialize the MT state of random.random. I really don't understand why MT needs 2500 bytes of distinct CPRNG data. The module should rather read less data and then stretch it into a larger init vector. We could use SipHash for the job. In fact why does the MT use a CPRNG at all? It's not designed as CPRNG source and could be initialized from other sources (id(self), time()...) instead.
msg267678 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 14:14
The attached patch (against tip) is a very quick attempt to implement the desired behavior:

* add a "nonblocking" argument to py_getrandom()
* dev_urandom_python() sets this to 0, to request blocking operation
* dev_urandom_noraise() sets this to 1 to request non-blocking

As far as I can tell, dev_urandom_noraise() is only called by _PyRandom_Init() so only the hash secret initialization will be affected.

Victor; can you take a look and let me know what you think? This probably needs some comments etc. before pushing. I'm also a little unsure about the rest of the logic in dev_urandom_noraise - it should probably also be modified to not block in the event that urandom is unavailable.
msg267679 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 14:17
nonblocking_urandom_noraise.patch doesn't fix the issue #21470.
msg267680 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-07 14:18
> nonblocking_urandom_noraise.patch doesn't fix the issue #21470.

Sorry, it's the issue #25420: "import random" blocks on entropy collection on Linux with low entropy
msg267681 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 14:18
> I just don't understand why you are apparently not willing to even consider compromises.

I have one thing that I hold immutable here, That os.urandom should use the best interfaces provided by the OS to ensure that it always returns cryptographically random data.

I don't care if SipHash is allowed to use lesser data.
I don't care if os.urandom raises an exception or if it blocks if enough entropy isn't available.
I don't care if people are given an option that will let them maybe get bad data (but will only work on Linux or older *nixes).

All I care about is that the default behavior of os.urandom gets data that is generated using the best practices for that system, because that's what people have been told to use for years and years.
msg267682 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-06-07 14:19
man urandom:

"A read from the /dev/urandom device will not block waiting for more entropy.  As a result, if there is not sufficient entropy in  the
       entropy  pool,  the  returned  values  are  theoretically  vulnerable to a cryptographic attack on the algorithms used by the driver.
       Knowledge of how to do this is not available in the current unclassified literature, but it is theoretically possible  that  such  an
       attack may exist.  If this is a concern in your application, use /dev/random instead."


There was never any guarantee on Linux. Python is a language and not an application. Security checks should be done by applications or better during the OS startup.  Any properly configured Linux server will not have a problem, but it is not up to a language implementation to check for that.
msg267684 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 14:43
Victor -

I see three options for 3.5.2:

* continue with the 3.5.1 behaviour, which blocks all python invocations in low-entropy situations. I think this is highly undesireable.

* apply my patches, which fixes the hash secret initialization but not 'import random'. This at least allows current Debian testing-track systems to boot properly ;)

* attempt to find a solution for #25420 which also addresses this issue. The original patch we submitted fixed both, but has encountered community objections from Donald and others.

The situation we're encountering is that it is *not possible* to use a sound PRNG under certain circumstances - if the system doesn't have entropy, it doesn't have entropy and there's not a lot to be done about it apart from wait.

I posit that an application which uses the random module has higher expectations of unpredictability, and therefore should take userspace measures to ensure entropy availability (as you suggest in msg253163 for example).

Note that the previous behavior (reading /dev/urandom) returns potentially unsafe data (as Donald and others point out). The only resolution to me seems to be modifying the behavior of the random module so that the buffer is initialized lazily (at first use, rather than at module import). This should be relatively straightforward, but I haven't had time to unpick all the logic of random.py to determine The Right Thing. Maybe Raymond can take a look at this?

In summary: I propose that the fix for this issue be implemented using the patches already discussed in this thread, and the fix for #25420 be implemented by modifying random.py.

Is this acceptable to everyone?
msg267685 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 14:47
PSRT VETO!

This ticket is turning into a bike-shedding discussion. In the light of the upcoming release 3.5.2 I'm now putting on my PSRT hat (Python Security Response Team) and proclaim a veto against any and all changes to os.urandom(). The security properties of os.urandom() must not be modified or reduced compared to 3.5.1. Please restore the behavior of os.urandom().

Reasoning:
The security of our general audience is much more important than this special case. I agree that the problem of Python blocking in an early boot phase should be fixed. But under no circumstances must the fix affect security. For now please work around the issue with PYTHONHASHSEED or forwarding the host's entropy source into your virtualization environment.

Any change to os.urandom(), _Py_HashSecret (I'm the author of PEP 456) and Mersenne-Twister initialization of random.random() shall go through a formal PEP process. I'm willing to participate in writing the PEP. A formal PEP also enables us to ask trained security experts for review.
msg267686 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 14:49
On 07.06.2016 16:18, Donald Stufft wrote:
> 
> Donald Stufft added the comment:
> 
>> I just don't understand why you are apparently not willing to even consider compromises.
> 
> I have one thing that I hold immutable here, That os.urandom should use the best interfaces provided by the OS to ensure that it always returns cryptographically random data.
> 
> I don't care if SipHash is allowed to use lesser data.
> I don't care if os.urandom raises an exception or if it blocks if enough entropy isn't available.
> I don't care if people are given an option that will let them maybe get bad data (but will only work on Linux or older *nixes).
> 
> All I care about is that the default behavior of os.urandom gets data that is generated using the best practices for that system, because that's what people have been told to use for years and years.

Fine, but that's just your personal desire.

It's not something that we have documented anywhere - all these
years, we've told people that os.urandom() is an interface to
/dev/urandom and this was now changed in an incompatible way
without giving notice to anyone.

Why not expose os.getrandom() and then have folks that care
about having it block in the case of an uninitialized
entropy buffer use that ?

This gives people a clear choice and doesn't cause people
to have to reconsider using the random module or wait for
Python hash randomization to initialize itself when using
Python during VM/container/system startup.

I don't really appreciate this approach to break Python in
cloud setups just because some entropy pool is not initialized,
which only a tiny fraction of users care about. It doesn't
make Python land a better place.

This is similar to the approach that Python3 should always
determine the I/O encoding from the environment, without providing
sane defaults to even startup the interactive console or run a
pipe. It wasn't helping with the adoption of Python. We've
resolved that by making useful assumptions. We should do
the same here.
msg267687 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 14:57
Christian -

I would like to make one further comment:

The only reason getrandom() was used instead of /dev/random was to avoid wasting a file descriptor. The previous behavior was in use for many years with no security issues; it was changed for FD conservation reasons, not security reasons.

The change between 3.5 and 3.5.1 caused a very notable regression; the initialization of the hash secret can block indefinitely under circumstances which unfortunately are fairly common.

Persisting with the 3.5.1 behavior, in my opinion, violates the principle of least surprise - Python blocks at startup waiting for random data even when none is actually required by the application. The fallback to 3.5 behavior is only invoked under the single case where the system PRNG is uninitialized.

You are within your rights to request the reversion; however I want to point out again that the implications are the introduction of multi-minute delays into the startup times of hundreds of millions of systems, due to a change in *Python's* behavior.

Colm
msg267688 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 14:59
On 2016-06-07 16:49, Marc-Andre Lemburg wrote:
> This gives people a clear choice and doesn't cause people
> to have to reconsider using the random module or wait for
> Python hash randomization to initialize itself when using
> Python during VM/container/system startup.
> 
> I don't really appreciate this approach to break Python in
> cloud setups just because some entropy pool is not initialized,
> which only a tiny fraction of users care about. It doesn't
> make Python land a better place.

VM and cloud setup without a proper CPRNG source are plain broken. True
fact!

Secure entropy sources are a fundamental resource for all modern
applications. Please start treating CPRNG like RAM, CPU or disks. You
wouldn't add a workaround for broken CPU instructions to math.c or
semi-functional network card to socket.c, would you?
msg267689 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 15:10
On 07.06.2016 16:59, Christian Heimes wrote:
> 
> Christian Heimes added the comment:
> 
> On 2016-06-07 16:49, Marc-Andre Lemburg wrote:
>> This gives people a clear choice and doesn't cause people
>> to have to reconsider using the random module or wait for
>> Python hash randomization to initialize itself when using
>> Python during VM/container/system startup.
>>
>> I don't really appreciate this approach to break Python in
>> cloud setups just because some entropy pool is not initialized,
>> which only a tiny fraction of users care about. It doesn't
>> make Python land a better place.
> 
> VM and cloud setup without a proper CPRNG source are plain broken. True
> fact!
> 
> Secure entropy sources are a fundamental resource for all modern
> applications. Please start treating CPRNG like RAM, CPU or disks. You
> wouldn't add a workaround for broken CPU instructions to math.c or
> semi-functional network card to socket.c, would you?

For security relevant applications, I agree and for those
I question the use of os.urandom() altogether (see my other
replies), but for everything else, a PRNG is just fine.

I'm repeating myself, but making users believe that an
entropy source is more important that preventing a
denial of service just won't work out.

You're position is quite similar to the one that others
have taken with the I/O encoding in Python3. Their stance
was "fix your system and it'll work". Well, tell that
to 9-year olds who want to learn Python.

Likewise, it may be easy for you to track down the reason
why Python 3.5.1 isn't working in your VM or container,
but not everyone knows that there's an entropy source
which needs to be connected to your VM or container to
make things work - and even if they do know about the problem
they may not have the means to do so.

It's pretty much the same situation and the reason why
we have "practicality beats purity".

A hanging Python process is the worst of all user
experiences.
msg267690 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 15:10
Christian -

Please note: this is *not* just a VM/cloud issue. This is observed on physical standalone systems.

The issue (on Debian) is that the Python script xxxx is called very early in the boot process; in particular before most hardware initialization is done. As there are yet no network or USB devices configured, there is no entropy pool to speak of. We observe that getrandom() blocks apparently indefinitely under these circumstances (even though this script has no requirement for random data apart from the hash secret).

My final suggestion is that we return to using a command-line flag to indicate our preferences regarding hash seed initialization; although reverse the sense compared to the -R flag in 3.2.3 (ie: default is to use strong initialization, but allow the user to over-ride just as though PYTHONHASHSEED were set in the environment.)
msg267693 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 15:23
... the script is /lib/systemd/system-generators/systemd-crontab-generator, although that's not hugely germane to the discussion. Arranging for PYTHONHASHSEED to be set while it's called wouldn't be impossible of course, although a command-line flag would be easier and probably safer.
msg267694 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 15:32
> PSRT VETO!

This is an amusing concept, but membership in the PSRT does not empower you with a "veto".

On the other hand, being Release Manager does give me some say here.


>  You wouldn't add a workaround for broken CPU instructions to math.c or semi-functional network card to socket.c, would you?

Well, yes, of course we would, if we had to.  Consider the F00F bug.  Happily the operating systems handled that one for us.

It is unreasonable for Python startup to take 90 seconds, poorly-configured cloud virtual machine or otherwise.  And there are many, many uses of the random module and hashlib that don't require CPRNG.

On the other hand, people who need cryptographic-strength random bits should be able to get them.  And the documentation literally does state that os.urandom() is a source of cryptographically-suitable random bytes.

ISTM that the happy middle ground would be:
 * seed the random module with non-cryptographically-secure random bits
 * lazily seed hashlib

Am I missing something, besides the anxiety of making this sort of change four days before I tag 3.5.2 RC2?
msg267695 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 15:35
> Please note: this is *not* just a VM/cloud issue. This is observed on physical standalone systems.

But it should only occur on initial boot I believe? AFAIK all of the major linux vendors have stored a seed file once the machine has been booted and the pool has been initialized to use to seed the pool on subsequent boots. I suppose it's possible that /lib/systemd/system-generators/systemd-crontab-generator is running prior to Debian reseeding the pool from that seed file, but it seems like it should be doing the reseeding as early as possible?
msg267696 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 15:39
>  ISTM that the happy middle ground would be:
> * seed the random module with non-cryptographically-secure random bits
> * lazily seed hashlib

I don't think it was actually hashlib that was causing the problem, but rather the initialization of SipHash, it just so happened that hashlib was the first import and thus was getting the blame. I could be wrong about that though. In any case, I think it's perfectly reasoanble to seed the random module with non-cryptographically-secure random bits and if applicable lazily seed hashlib.

I also think it's reasonable to seed SipHash with possibly non-cryptographically-secure random bits since it's likely to only be a problem early on in the boot cycle. Another option is as Colm suggestion, allowing the inverse of the old -R flag, to turn off hash seed randomization from the CLI so that scripts that run early on in the boot process can disable hash seed randomization and not require reading from urandom (assuming of course, they don't do something that explicitly calls os.urandom).
msg267699 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 16:04
Donald -

To be clear - no import of random or of hashlib is required to trigger this issue. The null script alone triggers the issue; the Python hash secret is initialized at startup regardless of script contents.

Yes, there is a race condition at system boot which we can probably resolve with userspace manipulations. I still feel that having Python hang indefinitely under certain circumstances, even when the application does not require any entropy, is a violation of the principle of least surprise. At the very least, there should be a command-line flag to disable "secure" initialization of the hash secret.
msg267705 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 17:16
Larry -

I see at least two issues here, although they are related:

* blocking initialization of the hash secret. This occurs regardless of script contents; at present Python simply can't be used at all in low-entropy situations. I feel that this issue is a release blocker.

Possible resolutions:
  * accept possible low-entropy initialization of the hash secret; using the patches supplied here by myself and Victor.
  * add a command-line flag to disable "strong" initialization of the hash secret (or revive the old -R flag).
  * simply require user-space workarounds like setting PYTHONHASHSEED


* blocking random reads during import hashlib or import random. This is more complex, as we need to take developer intentions into account. I do *not* think that these are release blockers as there are reasonably easy workarounds, however the fact remains that there has been a regression in Python's behavior on Linux.

Possible resolutions:

  * accept Victor's existing changeset without my nonblocking_urandom_noraise patch, which makes _PyOS_URandom nonblocking in all Linux cases.
  * resolve as above (both Victor's and my patches), and require that applications be modified to work correctly
  * require modifications to hashlib.py and random.py to use nonblocking sources and/or raise exceptions accordingly.

I see these largely as policy decisions rather than technical ones. The security implications of the first issue are fairly small (I would be interested in PSRT's assessment of an actual attack on a predictable hash secret); of the second issue rather larger and probably unquantifiable.
msg267707 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 17:36
> Possible resolutions:
>  * accept possible low-entropy initialization of the hash secret; using the patches supplied here by myself and Victor.
>  * add a command-line flag to disable "strong" initialization of the hash secret (or revive the old -R flag).
>  * simply require user-space workarounds like setting PYTHONHASHSEED

I think either the first or second here are good solutions, the third is kind of crummy on it's own because it's not always possible to pass in an environment variable. Pairing the third with a CLI flag option might work out nice though, perhaps a -XPYTHONHASHSEED=(random/int()) or something. Then folks who are in early boot can easily just hardcode a hash seed, removing the need to hit the entropy pools while still maintaining strong random for everyone else.

So I guess I would lean towards adding a CLI flag, but just allowing SipHash to fall back to possibly bad randomness for it's initialization is OK.

>  * accept Victor's existing changeset without my nonblocking_urandom_noraise patch, which makes _PyOS_URandom nonblocking in all Linux cases.
>  * resolve as above (both Victor's and my patches), and require that applications be modified to work correctly
>  * require modifications to hashlib.py and random.py to use nonblocking sources and/or raise exceptions accordingly.

Of these, I think random.py should just not use a CSPRNG, it's not required for it so there's no reason to do it. I don't think there's actually any problem with hashlib, I don't see any use of random in it.


> I would be interested in PSRT's assessment of an actual attack on a predictable hash secret 

For something like systemd-crontab-generator, basically nothing-- for anything short lived or which does not provide a means for arbitrary users to put data into a dictionary. IOW, it's largely persistent network services.
msg267709 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 17:46
Thank you for summarizing the debate. It made it a lot easier to 

> * blocking initialization of the hash secret. This occurs regardless of script contents; at present Python simply can't be used at all in low-entropy situations. I feel that this issue is a release blocker.
>
> Possible resolutions:
>   * accept possible low-entropy initialization of the hash secret; using the patches supplied here by myself and Victor.
>   * add a command-line flag to disable "strong" initialization of the hash secret (or revive the old -R flag).
>   * simply require user-space workarounds like setting PYTHONHASHSEED

The latter two approaches are unacceptable IMO.  They result in a poor user experience.  Python should do the "right" thing by default; the "right" thing includes not taking 90 seconds to start up.

By process of elimination, this leaves only the first approach as viable.  Ergo, let's do that.

The hash secret is a 32-bit integer, even on 64-bit builds of Python.  It is not and cannot be cryptographically secure.  It's frankly ridiculous to fret about "strong" initialization of it at the cost of a 90 second startup time.

(For posterity: when people mention "SipHash", they're talking about the hashing algorithm used for str/dict/etc.  The seed for SipHash is the "hash secret" we're talking about here.)


> * blocking random reads during import hashlib or import random. This is more complex, as we need to take developer intentions into account. I do *not* think that these are release blockers as there are reasonably easy workarounds, however the fact remains that there has been a regression in Python's behavior on Linux.
> 
> Possible resolutions:
> 
>   * accept Victor's existing changeset without my nonblocking_urandom_noraise patch, which makes _PyOS_URandom nonblocking in all Linux cases.
>   * resolve as above (both Victor's and my patches), and require that applications be modified to work correctly
>   * require modifications to hashlib.py and random.py to use nonblocking sources and/or raise exceptions accordingly.

I don't follow whose patch does what.  But here's what I find acceptable, from a high level.

* The semantics as presented by the documentation must be preserved.  os.urandom() and other operations that declare they're safe for cryptographic use must remain safe for cryptographic use.
* "import random" must not block.
* "import hashlib" must not block.

Is there a patch set that accomplishes that?

--

If this means that random.random() may be seeded with poor-quality random bits, so be it.  As I think I already stated in this thread: there are many non-cryptographic uses for random.random().  And the documentation for the random module already states that it's not suitable for cryptography.  So making it block in order to procure a cryptographically-strong seed is counterproductive.

Also, I think the constraint "import hashlib must not block" is a non-issue.  I preserved it above just in case I missed something.  But Colm was the one who suggested "import hashlib" was blocking, and he quickly said afterwards that he was mistaken.  In any case a quick review of the code suggests that hashlib never uses getrandom and thus should not currently block.  Unless someone says otherwise, I'll assume hashlib is fine, never blocks on import, and thus requires no modification.
msg267710 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 17:52
> I don't follow whose patch does what.  But here's what I find acceptable, from a high level.
> 
> * The semantics as presented by the documentation must be preserved.  os.urandom() and other operations that declare they're safe for cryptographic use must remain safe for cryptographic use.
> * "import random" must not block.
> * "import hashlib" must not block.
>
> Is there a patch set that accomplishes that?

I *think* nonblocking_urandom_noraise.patch will solve the 90+ second start up without affecting os.urandom which should solve the first one (once the already applied patch gets reverted), but I'm afraid I don't know C well enough to meaningfully review that for accuracy.

None of the current patches solve the second without invalidating the first, but it would be, I believe, an additional patch ontop of nonblocking_urandom_noraise.patch.

The third is already the case.
msg267711 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 17:53
On 2016-06-07 19:46, Larry Hastings wrote:
> 
> Larry Hastings added the comment:
> 
> Thank you for summarizing the debate. It made it a lot easier to 
> 
>> * blocking initialization of the hash secret. This occurs regardless of script contents; at present Python simply can't be used at all in low-entropy situations. I feel that this issue is a release blocker.
>>
>> Possible resolutions:
>>   * accept possible low-entropy initialization of the hash secret; using the patches supplied here by myself and Victor.
>>   * add a command-line flag to disable "strong" initialization of the hash secret (or revive the old -R flag).
>>   * simply require user-space workarounds like setting PYTHONHASHSEED
> 
> The latter two approaches are unacceptable IMO.  They result in a poor user experience.  Python should do the "right" thing by default; the "right" thing includes not taking 90 seconds to start up.
> 
> By process of elimination, this leaves only the first approach as viable.  Ergo, let's do that.
> 
> The hash secret is a 32-bit integer, even on 64-bit builds of Python.  It is not and cannot be cryptographically secure.  It's frankly ridiculous to fret about "strong" initialization of it at the cost of a 90 second startup time.
> 
> (For posterity: when people mention "SipHash", they're talking about the hashing algorithm used for str/dict/etc.  The seed for SipHash is the "hash secret" we're talking about here.)

The secret for SipHash is composed of two 64bit integers. The entire
_Py_HashSecret_t struct is 24 bytes. The remaining 8 bytes are used for
XML hash randomization of libexpat. Only the manual seed with
PYTHONHASHSEED is a 32bit integer which is stretched to 24 bytes with a LCG.

Christian
msg267712 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 17:59
On 2016-06-07 19:36, Donald Stufft wrote:
> 
> Donald Stufft added the comment:
> 
>> Possible resolutions:
>>  * accept possible low-entropy initialization of the hash secret; using the patches supplied here by myself and Victor.
>>  * add a command-line flag to disable "strong" initialization of the hash secret (or revive the old -R flag).
>>  * simply require user-space workarounds like setting PYTHONHASHSEED
> 
> I think either the first or second here are good solutions, the third is kind of crummy on it's own because it's not always possible to pass in an environment variable. Pairing the third with a CLI flag option might work out nice though, perhaps a -XPYTHONHASHSEED=(random/int()) or something. Then folks who are in early boot can easily just hardcode a hash seed, removing the need to hit the entropy pools while still maintaining strong random for everyone else.
> 
> So I guess I would lean towards adding a CLI flag, but just allowing SipHash to fall back to possibly bad randomness for it's initialization is OK.

I don't like the fact that applications can fall back to insecure RNG
without user involvement or warning.

Therefore I'm in favor of a command line argument that allows pyhash.c
to fall back to a less secure RNG. System scripts must use the -I option
(isolated mode without user-site dir and PY* env vars) anyway. The new
option would enable less secure RNG as fallback and -I.
msg267715 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 18:29
I've attached a minimal patch for making it so ``import random`` does not block, it does this by changing what the default instance of Random() is seeded with, from os.urandom() to the time based fallback it currently employs. It does not change the behavior of any documented behavior that I can see (it's documented that calling seed(None) or seed() will use urandom if available).

This could be improved by mixing in id(self) and using SipHash or LCG on the value, but this represents a minimal patch that is already possible in cases where os.urandom doesn't exist.
msg267716 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 18:34
Everybody: let's drop discussing "hashlib" unless someone says it actually is a problem.  I think it was always, as we say in English, a "red herring".


> The secret for SipHash is composed of two 64bit integers. The entire _Py_HashSecret_t struct is 24 bytes. The remaining 8 bytes are used for XML hash randomization of libexpat. Only the manual seed with PYTHONHASHSEED is a 32bit integer which is stretched to 24 bytes with a LCG.

Okay, I have misunderstood the code.  Have I misunderstood the strength of SipHash?  Is it regarded as "cryptographically secure"?

Predictability of the hash function on web servers was the original use case of the "hash seed"; I remember a demonstration of an attack where the attacker produced pathologically bad hash behavior on a Python-based web server with very little data.  So it seems like web servers running on cloud instances is exactly the sort of use case where we'd want less-predictable hashing.

--

Nevertheless, a 90 second startup time is simply unacceptable.  I am officially making a pronouncement as Release Manager: Python 3.5 *must not* take 90 seconds to start up under *any* circumstances.  I view this as a performance regression, and it is and will remain a release blocker for 3.5.2.

Python *must not* require special command-line flags to avoid a 90 second startup time.  Python *must not* require a special environment-variable to avoid a 90 second startup time.  This is no longer open to debate, and I will only be overruled by Guido.

--

If I understand the technical issues correctly, here's how I expect it to work.  For seeding the hash randomization, and seeding the _inst in the random module, we will use getrandom() in a non-blocking way (GRND_NONBLOCK?).  If it succeeds, we use those bits.  If it fails because it would have blocked (EAGAIN?), we fall back to a less-random source of random bits.  Under no circumstances will Python block when seeding the hash randomization function or seeding the MT for the random module.

This means cloud instances may inadvertently use lower-quality hash randomization seeds.  I judge this as obviously better than cloud instances taking 90 seconds to start up.  Also, as Christian points out, the people running these cloud instances should be managing their entropy pools anyway.  Additionally, there are many uses of cloud instances that aren't exposed to tainted data that permit these predictable-hash abuses.

--

As a final note, let me steer you towards this comment in Python/random.c:

/* Issue #25003: Don' use getentropy() on Solaris (available since
 * Solaris 11.3), it is blocking whereas os.urandom() should not block. */

Yes: we already had this discussion for Solaris, nine months ago, on issue #25003.  Both Guido and Tim Peters were involved in the discussion.  The decision there: use lower-quality random bits to seed the MT when importing the random module.  Keeping the slowdown was so obviously wrong it wasn't even debated.
msg267718 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 18:40
> As a final note, let me steer you towards this comment in Python/random.c:
>
> /* Issue #25003: Don' use getentropy() on Solaris (available since
>  * Solaris 11.3), it is blocking whereas os.urandom() should not block. */
>
> Yes: we already had this discussion for Solaris, nine months ago, on issue #25003.  Both Guido and Tim Peters were involved in the discussion.  The decision there: use lower-quality random bits to seed the MT when importing the random module.  Keeping the slowdown was so obviously wrong it wasn't even debated.

I will point out, that was a somewhat different situation as ``getentropy`` on Solaris is more like /dev/random in that it tries to decide how much random-ness is in the pool and will randomly block throughout the execution of the program. The ``getrandom()`` call on Linux (and Solaris) will, by default, only block on the first boot at the very beginning before the kernel has collected enough entropy.

I don't think this changes anything, I just want to be clear because there are two kinds of "blocking" in this discussion, one that only occurs in very specific scenarios and one that occurs regularly in the operation of the program.
msg267720 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 18:45
Larry -

To the first point:

The combination of Victor's changeset 9de508dc4837 (based on my patch) and my most recent nonblocking_urandom_noraise patch (which is on top of 9de508dc4837) will do what you suggest for the hash secret initialization - ie: it is allowed to fall back to predictable sources when there is insufficient entropy to securely seed it.

I suspect that it is simply impossible to reconcile "os.urandom will never block" with "os.urandom is always cryptographically reasonable". If the system has no entropy, it has no entropy. The only escape I see is to add an exception condition, instead of the silent fallback which some platforms currently have. There is a judgement call to be made here; whether silent fallback is acceptable or not.

As Donald points out, this will fail only in very unusual circumstances (specifically, early in the boot process, although not I think just on the first boot of a system; Debian at least by default does not attempt to preserve its entropy pool across a reboot.)

This should not affect things like web servers etc. as they start much later in the boot process; in particular after networking has started, which I believe is the principal source of entropy for /dev/urandom.

Colm
msg267721 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 18:46
That reminds me.  I want to be clear: I think it's preferable that os.urandom() blocks when insufficient entropy is available.  If Victor's patch changed that, it should be backed out.

(Since non-blocking urandom is useful, perhaps in 3.6 os.urandom() should take a new "block=True" parameter.  But it's too late to add it for 3.5.)
msg267723 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2016-06-07 18:51
On 07.06.2016 16:47, Christian Heimes wrote:
> 
> Christian Heimes added the comment:
> 
> PSRT VETO!
> 
> In the light of the upcoming release 3.5.2 I'm now putting on my PSRT hat (Python Security Response Team) and proclaim a veto against any and all changes to os.urandom(). The security properties of os.urandom() must not be modified or reduced compared to 3.5.1. Please restore the behavior of os.urandom().

So you are intentionally accepting a new vector for DoS attacks, and calling
this non-reduced security?
msg267725 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 18:55
To clarify what the various patches do:

3.5.1 as released: os.urandom and hash secret initialization both attempt getrandom() in preference to reading /dev/urandom. Under certain circumstances, this will block, possibly indefinitely.

Changeset 9de508dc4837: both os.urandom and hash secret initialization call getrandom() in nonblocking mode, falling back to (possibly low-entropy) /dev/urandom should getrandom() block due to lack of entropy.

Changeset 9de508dc4837 + nonblocking_urandom_noraise.patch: hash secret initialization calls getrandom() in nonblocking mode (ie: will always succeed, although with a silent fallback to low-entropy data if called when the system has no entropy). os.urandom will always block until there's enough entropy.

I think this final case implements what you need for the 3.5.2 RC.

The issue of "import random" still needs to be resolved; maybe we should de-merge #25420 and pursue Donald's approach there.

Thanks,

Colm
msg267726 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 19:00
> specifically, early in the boot process, although not I think just on the first boot of a system; Debian at least by default does not attempt to preserve its entropy pool across a reboot.)

Look at /etc/init.d/urandom in the initscripts package in Jessie (https://anonscm.debian.org/cgit/collab-maint/sysvinit.git/tree/debian/src/initscripts/etc/init.d/urandom).
msg267728 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 19:03
Donald -

Oh, that's interesting; thank you. I guess that system-crontab-generator is being called before that in the boot process.

The most common trigger case, I guess, will then be cloud containers and VMs which are spun up for single applications. I think Larry's comments are still very valid.

Colm
msg267729 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-07 19:03
> So you are intentionally accepting a new vector for DoS attacks, and calling
this non-reduced security?

This is only a DoS vector if you can hit the server so early in the boot process that it doesn't have enough entropy. The *second* enough entropy has been gathered getrandom() will never block again.

In essence, then, the situation where it becomes possible to DoS a server is entirely outside an attackers control and extremely unlikely to ever actually occur in real life: you can only DoS the server if you can demand entropy before the system has gathered enough, and if the server has managed to *boot* by then then the alternative is that it is incapable of generating secure random numbers and shouldn't be running exposed against the web anyway.
msg267730 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 19:10
> This is only a DoS vector if you can hit the server so early in the boot process that it doesn't have enough entropy.

Python hash randomization only happens once.  So it's not a matter of how early we try the attack, it's a matter of how early we seed Python hash randomization.
msg267731 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-07 19:12
> Python hash randomization only happens once.  So it's not a matter of how early we try the attack, it's a matter of how early we seed Python hash randomization.

Sorry Larry, I was insufficiently clear (relying on context from earlier). I totally agree that Python startup should not block. I'm saying that having getrandom() called in "blocking mode" for os.urandom, random.SystemRandom, and secrets is not a DoS vector.
msg267735 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 19:50
I've spoken with Ted Ts'o (one advantage of working for Google) and taken a look in the Linux kernel source, and things are actually better than we'd feared.

Firstly, calling getrandom() with GRND_NONBLOCK and a buffer size of less than or equal to 32 bytes will always succeed (so, for the hash seed initialization at least, the EAGAIN logic is superfluous - it's still possibly needed for the general case and other operating systems, though).

Secondly, the quality of the getrandom data *before* the kernel PRNG is initialized is still pretty good - it's seeded from a combination of RDRAND, interrupt timing, several kernel parameters like uname -a, and RTC. Ted is confident that at least 24 bytes of real entropy will be present by a few seconds into boot time (due to interrupts etc), and that the predictability of the data will be very low.

Finally - note that any network-facing applications are *extremely* unlikely to encounter this issue, as they will be started well after networking and other good entropy sources have started. In particular, getrandom() will no longer block once fastinit has completed (on my system, this was less than one second after kernel load).

In other words, I think we are very safe to proceed with changeset 9de508dc4837 + the nonblocking_urandom_noraise.patch

Note that this solves the problem for *Linux* - if other operating systems do indeed have blocking /dev/urandom reads, this still needs to be addressed. I am not aware of any reports from non-Linux systems, though.
msg267737 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 19:54
Colm,

Great, then I think there's general agreement, we just need someone to review the nonblocking_urandom_noraise.patch (which my C is not strong enough to feel comfortable doing). That still leaves the `import random` issue, but I think we can reopen #25420 and figure out over there what the right answer is for that.
msg267739 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-07 19:55
I fear I may be changing my mind a little bit.  However, I skipped breakfast--and now it's looking like a late lunch--so I simply have to step away for a while.  Expect me to post in about two hours when I get some calories down and finally make up my tiny mind.
msg267740 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 20:01
On 07.06.2016 21:12, Cory Benfield wrote:
> 
>> Python hash randomization only happens once.  So it's not a matter of how early we try the attack, it's a matter of how early we seed Python hash randomization.
> 
> Sorry Larry, I was insufficiently clear (relying on context from earlier). I totally agree that Python startup should not block. I'm saying that having getrandom() called in "blocking mode" for os.urandom, random.SystemRandom, and secrets is not a DoS vector.

I'm not sure I follow. A delay of 90 seconds on startup of a VM
or container can easily lead the supervisor style management tool
to think that something is wrong and issue a retry. Depending on
the configuration it'll then try this a couple of times and
give up.

Overall, I have a hard time following the arguments.

To be clear: getrandom() on Linux is just a wrapper with some
additional control around /dev/random and /dev/urandom.

http://lxr.free-electrons.com/source/drivers/char/random.c#L1601

Unlike /dev/urandom, getrandom() without flag GRND_NONBLOCK will
block, but only in the case where the entropy pool has not been
initialized yet. Once this has been done, it will never block
again, and happily send you poor random data if the entropy pool
has been completely wiped of any entropy data - without telling
you.

So now, you're all arguing: oh my, it's so insecure to use
data from /dev/urandom when the entropy pool is not initialized.
But you're not worried about os.urandom() happily sending you data
which is no longer based on any external entropy half an hour later:

http://lxr.free-electrons.com/source/drivers/char/random.c#L1458

This doesn't make sense. Either you're worried all the time,
or you're not :-)

The whole discussion is centering around whether to block
on an uninitialized entropy pool or not. This can only happen
during startup. By falling back to reading /dev/urandom
in case of an uninitialized pool, you are reading data from
a not fully initialized pool, but you still get random data.
That's really all that's needed for basic operations like
hash seeding or seeding the PRNG in the random module.
And it's limited to Python processes which are run very
early in the VM/container startup phase.

Note that "uninitialized" only means that the kernel entropy
pool has not yet reached an "entropy level" of 128 (whatever
that means):

http://lxr.free-electrons.com/source/drivers/char/random.c#L676

It does not mean that you're just reading a list of zeros.

So given all this information, why is it that you get so
tangled up in wanting os.urandom() to block during system
startup ?

Or put differently: Where is the attack vector that blocking
behavior of os.urandom() would help remedy ?
msg267741 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 20:07
> Once this has been done, it will never block again, and happily send you poor random data if the entropy pool has been completely wiped of any entropy data - without telling you.

This doesn't actually happen in real life, once urandom has been initialized you will never be able to get "poor random" out of it. You will get cryptographically secure random out of it always. *ACTUAL* Cryptographers pretty much universally agree on this statement. You can even use them for cryptographic keys, no matter how long it's been since your system booted as long as the urandom pool has had a chance to initialize.

> Or put differently: Where is the attack vector that blocking behavior of 
os.urandom() would help remedy ?

Someone attempting to use cryptographic random before the urandom pool has been sufficiently initialized to provide said random.
msg267745 - (view) Author: Theodore Tso (Theodore Tso) Date: 2016-06-07 20:27
Hi.   Colm alerted me to this bug, so I thought I would chime in as the author of Linux's getrandom(2) function.

First of all, if you are OK with reading from /dev/urandom, then you might as well use getrandom's GRND_NONBLOCK flag.  They are logically equivalent.

Secondly, when I decided to add this behavior to getrandom(2), it was because people were really worried that people would be using /dev/urandom for security-critical things (e.g., initializing ssh host session keys, when they'd _really_ rather not the NSA have be able to trivally pwn the server) before it had been completely initialized.   (And if it is not completely initialized, it would be trivially and embarassingly easy.  See https://factorable.net/weakkeys12.extended.pdf for an example of where this was rather disastrous.)

Why didn't I make /dev/urandom blocking?  Because a lot of people would whine and complain.   But getrandom(2) was a new interface, and so this was something I could do.   Now, before I decided to do this, I did do some benchmarks, and pre-systemd in practice on real hardware (e.g., x86 servers and laptops), I observed that you would actually see a message indicating that we had gathered 128 bits of entropy long before the root file system had been mounted.    With systemd, I observed that udevd was trying to read from /dev/urandom when we had only gathered an estimated 7 bits of entropy --- but I devoutly hoped that udevd wasn't doing anything super security critical, and trying to get the systemd people to change what they are doing is mostly like trying to teach a pig to sing, so I let it be.    However, in practice within a single digit number of seconds, the kernel printk indicating that random driver had considered itself initialized came quickly enough that I figured it would be safe to do.

If people are claiming that they are seeing cases where it takes over 90 seconds for the random number generator to initialize itself, please contact me directly; I'd love to know more, because that's input I would very much like to have.

However, at the end of the day, on certain hardware, if you don't have a source of initial entropy because the system doesn't have enough real hardware with real sources of entropy --- or if you don't trust your friendly cloud provider to provide you with some entropy from the hypervisor's entropy pool via virtio-random --- you can either (a) decide to pretend you are secure, when you really aren't, (b) wait, or (c) decide that you don't *really* need a secure source of randomness because you're really just initializing a hash for some associative array, and in fact srandom(time(0)) would have been fine, and you were using getrandom(2) or /dev/urandom just because you wanted to feel like one of the cool kids.

That being said, I do know of one potential issue which is if you happening to be using Microsoft Azure, the way the virtualized interrupt works, we weren't actually getting any entropy, and this was something I didn't discover until someone sent me a patch.  I have a patch[1] queued up in the random.git tree for the next kernel merge window to address that issue for Microsoft Azure servers. 

[1] http://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=8748971b4f5e322236154981827bf43dec4dc470

On a Google Compute Engine (GCE) system, I just did a quick test, and the "random: non-blocking pool initialized" message appears 5.64 seconds after the system is booted.  The changes I have queued up in random.git should reduce that to under a second.

All of this is neither here nor there, though.  The big question is *what* does Python expect to do with the randomness.  If you are just using it for computational simulation, you can do whatever you want.   If you are using it to create long-lived secrets that are intended to be secure against the depredations of a Nation-State's intelligence service, and you are on a system which really has almost no entropy available to be collected, then falling back to reading from /dev/urandom or using GRND_NONBLOCOK is going to be the equivalent of saying La-La-La-La-La-Nobody-Knows-How-Secure-I-Am while keeping your ears plugged.    (Now, if you are on an Intel system with RDRAND, and you trust Intel not to have given a back door to the NSA, you probably are safe, because we do actually mix in RDRAND.  On the other hand, if you are using some crappy ARM SOC for some Internet of Things device, and are firing up Python right after the system boots for the first time, and creating long-lived RSA private keys within milliseconds after the system is first booted --- please tell me so, I can avoid your product like the Plague.  :-)
msg267746 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-07 20:34
Thanks for weighing in Theodore, I think that matches what Colm's last suggestion was, and what I was personally OK with. To seed our SipHash function using GRND_NONBLOCK since it's likely that will be fine, and worst case we're just using it for some hash tables.

Then for our os.urandom binding, we should use getrandom() without GRND_NONBLOCK since we don't know why someone is calling os.urandom, but we know in practice people are using it for cryptographic keys and the like.
msg267749 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-07 21:19
Ted -

I'd suggest the following to test.

Boot an arbitrary Linux system with init=/usr/bin/python3 (assuming filesystems mounted etc). Python 3.5.1 (on Linux) will call getrandom() in blocking mode very early in its startup; if this happens before the pool is initialized, Python will fail to start. Given that ~nothing else will be happening, I'm interested to see what happens to the entropy pool, and whether getrandom() returns.

Haven't tried this myself, but it should work.

Colm
msg267750 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-07 21:24
On 07.06.2016 22:27, Theodore Tso wrote:
> 
> Secondly, when I decided to add this behavior to getrandom(2), it was because people were really worried that people would be using /dev/urandom for security-critical things (e.g., initializing ssh host session keys, when they'd _really_ rather not the NSA have be able to trivally pwn the server) before it had been completely initialized.   (And if it is not completely initialized, it would be trivially and embarassingly easy.  See https://factorable.net/weakkeys12.extended.pdf for an example of where this was rather disastrous.)

Thanks, Theodore, for this paper reference. It provides convincing
arguments that going back to the Python 3.4 behavior is indeed not
a good idea - even though I'm still not convinced that the main
use case for os.urandom() is cryptography. Most people will
simply use it to seed their Mersenne Twisters, like the random
module does too.

Now, raising an exception instead of blocking would likely cause
even more breakage, so I'm with Colm in keeping Victor's patch
and applying the fix to not block in dev_urandom_noraise().

We still need to fix the random module issue, though.

For 3.6, I wish we could have the getrandom() API exposed as
os.getrandom(), with all options available to applications.
That way, the application can decide what is best for them.
msg267751 - (view) Author: Theodore Tso (Theodore Tso) Date: 2016-06-07 22:45
I ran the experiment Colm asked me to run --- and yes, if you boot a system with Python 3.5.1 with the boot options "init=/usr/bin/python3", you're going to have a bad time.   The problem is that in a KVM environment where things are very quiet, especially if you are using a tickless kernel, if python calls getrandom(2), it will block since the entropy pool hasn't been initialized yet.   But since we aren't doing anything, the system becomes completely quiescent and so no entropy can be collected.  If systemd tries to run a python script very early in the boot process, and does this in a way where no further boot time activity takes place until the python script exits, you can indeed deadlock the system.

The solution is I think what Donald suggested in msg267746, which is to use GRND_NONBLOCK for initializing the hash which gets used for the dict, or whatever it's used for.   My understanding is that this is not a long-term cryptographic secret, and indeed it will be thrown away as soon as the python interpreter exits.  Since this is before networking has been brought up, the denial service attack or whatever requires that you use a strong SipHash for your Python dictionaries shouldn't be a problem.   (Which I gather has something to do with this?   https://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf)

Now, I can see people being concerned that if Python *always* initializes its hash dictionaries using getrandom with GRND_NONBLOCK, it might be opening up a DOS attack.   Well, in practice, once the boot sequence continues and the system is actually doing some real work, within a few seconds the random number generator will be initialized so in practice it won't be an issue once the system has booted.

If you want to be really paranoid, I suppose you could give some kind of command-line flag which tells Python to use GRND_NONBLOCK for the purposes of initializing its hash table for its dictionary, and only use it in the boot path.   In practice, I suspect very early in the systemd boot path, before it actually starts running the boot scripts in parallel, is the only place where you are likely going to run into this problem, so making it be a flag that only systemd scripts have to call is probably the right thing to do.   But I'll let someone else have the joys of negotiating with Lennart, and I won't blame the Python devs if using GRND_NONBLOCK unconditionally is less painful than having to work with the systemd folks.  :-)
msg267752 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-07 23:15
Thanks Theodore!

Your analysis was exactly what I was aiming for when I requested a thorough security analysis in form of a PEP. The correct choice of CPRNG is important for the overall security. I'm mostly concerned with the behavior of os.urandom(), which is Python's preferred CPRNG for any critical data from session keys to key material. Several people including me have been pushing developers towards os.urandom() for many years. For that reason I put up a veto against any hasty change. Larry could my bluff.

The hash randomization for hashing of strings and bytes is explained in PEP 456, https://www.python.org/dev/peps/pep-0456 . I wrote the PEP and added DJB's and JP Aumasson's SipHash24 as PRF. The 24 byte Py_HashSecret struct contains two keys for SipHash24 and another 8 byte key for randomization of expat XML library, https://hg.python.org/cpython/file/tip/Include/pyhash.h#l34 .

For short-running scripts early in the boot phase, hash randomization is not required at all. It is only relevant for applications that reads untrusted data from potentially malicious peers. GET dict of HTTP requests is a famous example. Hash randomization can already be disabled or set to a fixed value by using an env var. I argued to add an option that falls back to a different CPRNG and sets other options at the same time (#16499) but Larry (release manager of 3.5) is against any option. He wants Python to always start in a timely fashion without any extra arguments.

Your suggestion should fix the issue on Linux (GRND_NONBLOCK, fall back to srandom()), although I would rather use gettimeofday() with t.tv_sec + t.tv_usec. I'm still concerned how we should address the issue on BSD. As far as I am familiar with BSD, all reads from the Kernel's CPRNG are blocking until the CPRNG is seeded.

I can bring up the issue with Lennart, if it is really necessary (one advantage of working for Red Hat ;) ). I'm going to ping JP Aumasson to get his feedback.
msg267803 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 08:04
I've revised my thinking on the subject.

First, my previous statements stand: Python startup must not block.  "import random" must not block.

Now I'm thinking that "os.urandom()" must not block, too.  Here's my reasoning.

--

If you read #25003, it's clear that /dev/urandom is a well-known UNIX facility with well-known, predictable behavior.  One behavior that I'll draw particular attention to now: it will never block.  If the system is low on entropy, /dev/urandom will produce lower-quality random bits.

Again I repeat myself: this is the *expected* behavior.  It is so completely the expected behavior, that today's special celebrity guest on the issue, Mr. Theodore Ts'o himself, added /dev/random specifically so it would be permitted to block when the system was low on entropy.  He *did not change* the behavior of /dev/urandom.  He added a *new device*.

A well-informed engineer would see "os.urandom()" and predict (correctly) that Python has provided a thin layer over /dev/urandom.  Thus os.urandom() should provide the same well-known, predictable behavior as /dev/urandom.

It's fine to enhance os.urandom().  For example, it's fine to provide higher-quality bits where available.  It's fine to provide the function on Windows which doesn't have a /dev/urandom object.

What is *not* fine is to degrade its behavior.  /dev/urandom is known to never, ever block.  This is a *feature*.  os.urandom(), therefore, must also never, ever block.

Yes, this means that on these cloud instances with no entropy (yet), os.urandom() may return these low-quality random bits.  Just like /dev/urandom does.

If I understand the APIs correctly, I'm fine with os.urandom() calling getrandom(,,GRND_RANDOM|GRND_NONBLOCK).  If that fails with EAGAIN it should fall back to reading from /dev/urandom, or getrandom(,,GRND_NONBLOCK) if that makes sense.  (IDK if that's Linux-specific; if it is I suppose /dev/urandom is the more cross-platform way to go.)

--

If this is seen as the end of the world by the crypto guys in the thread, let me say that I'm willing to consider adding a new function in 3.5.2.  I would propose it be spelled "os.getrandom(n, block=True)".  Crypto code could use this function if available, and fall back to os.urandom() where it was not.  This means you're covered: in 3.5.0 and 3.5.1 you use os.urandom(), and in 3.5.2+ you use os.getrandom(), and in both circumstances you'll block if there's insufficient entropy.

--

p.s. Colm Buckley: you notice how dstufft's patch got a "review" link, and none of the patches you posted got one?  That's because his is based on the current 3.5 repo and yours aren't.  This "review" link is very useful in reading your patches.  Please in the future try to base your patches against 3.5 trunk.  It's easy:
% hg clone https://hg.python.org/cpython/ 
% cd cpython
% hg up -r 3.5
(do your work here)
% hg diff > patchfile
(upload patchfile)
msg267804 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-08 08:16
> If you read #25003, it's clear that /dev/urandom is a well-known UNIX facility with well-known, predictable behavior.  One behavior that I'll draw particular attention to now: it will never block.  If the system is low on entropy, /dev/urandom will produce lower-quality random bits.

That's not accurate.

/dev/urandom is a well-known UNIX facility, yes, but it does not have consistent behaviour across Unices. The behaviour you've described here Larry is a well-known *Linux* behaviour.

However, on other Unices the behaviour is different. Here, for example, is the relevant man page from Mac OS X ("man 4 random"):

     /dev/urandom is a compatibility nod to Linux. On Linux, /dev/urandom will
     produce lower quality output if the entropy pool drains, while
     /dev/random will prefer to block and wait for additional entropy to be
     collected.  With Yarrow, this choice and distinction is not necessary,
     and the two devices behave identically. You may use either.

Note the specific wording here: "the two devices behave identically". That is to say, on OS X both /dev/random and /dev/urandom are identical devices, and that includes the fact that both will in principle block if used without sufficient entropy.

OS X's implementation is a direct descendent of FreeBSD's, so the same caveats apply there, and in fact all the BSDs have this exact same behaviour.

So, again, I repeat my objection from above: if the concern is that starting Python must never block, then Python must *never* read from /dev/urandom on startup. Otherwise, Python *can* block on BSDs (OS X included in principle, though in practice I doubt Apple will use Python that early in boot).

At this point I literally no longer care whether os.urandom() is just a wrapper around /dev/urandom: we can look back on this in 10 years and see how we feel about the choices made by core dev at that time. But if we're arguing that this issue is about "Python must never block at startup", then we really have to acknowledge that /dev/urandom *can block* on some Unices, and so is entirely unacceptable for reading at startup.
msg267805 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 08:19
Are you certain that /dev/urandom will block on Mac OS X if sufficient entropy is not available?  The dismissive tone ("this choice and distinction is not necessary") suggests that *their* implementation is superior, and it could hardly be seen as superior if sometimes it blocks.
msg267806 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-08 08:29
I have never seen it block in person, but I also wouldn't expect to. OS X's blocking guarantees are the same as FreeBSD's: that is, both /dev/random and /dev/urandom block until sufficient entropy has been gathered at startup and then never block again.

This means that, on OS X, in practice, /dev/urandom does never block, because you basically can't run user code early enough to encounter this problem.

FreeBSD again has the exact same behaviour: /dev/urandom is a symlink to /dev/random, and both will block at startup until sufficient entropy is gathered and never again. This is the bigger risk for Python, because if Linux people want to use Python in their init system it's not unreasonable for FreeBSD folks to want to do it too.

This is why I'm concerned about this "solution": while there's no question that adding getrandom() made the situation worse on Linux, it has drawn our attention to the fact that Python is relying on Linux-only semantics of /dev/urandom on all Unices. That's probably not a good plan.

(The above is all facts. Everything in these parentheticals is opinion. Please disregard as appropriate.

I agree with the OS X devs in that I believe their implementation *is* better than Linux's: sorry Ted! There is no reason to be concerned about using a good kernel CSPRNG once sufficient entropy has been gathered *once*. The CSPRNG essentially "stretches" the entropy out into a long sequence of numbers, much like a cipher like AES "stretches" the entropy in the key across the entire cipherstream. Talking about "running out" of entropy in one of these devices is weird to me: as a blog post linked earlier mentions, it's like talking about "running out of key" in an encryption algorithm.

It seems to me, then, that Linux's /dev/random is wrong in most situations (because it sometimes blocks), and /dev/urandom is wrong in some situations (because it'll run before it has enough entropy to properly seed the CSPRNG and it won't tell you that that is what has happened). On OS X, the best of both worlds occurs: you get no random numbers until sufficient entropy has been gathered to seed the CSPRNG, and then you get good random numbers from that point on.

Please note: I am not a trained cryptographer. However, trained cryptographers have agreed with this set of sentiments, so I think I'm on pretty good ground here.)
msg267807 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 08:38
So, in short, you don't know.

#25003 is about Solaris, and the reporter clearly had the expectation that /dev/urandom would never block.  The documentation on Linux is clear: /dev/urandom will never block.  That's two.

This "StackExchange" discussion:
  http://security.stackexchange.com/questions/42952/how-can-i-measure-and-increase-entropy-on-mac-os-x
suggests that the Yarrow-based /dev/random and /dev/urandom on OS X will *both* degrade to PRNG if insufficient entropy is present.  Thus they are are *both* like /dev/urandom, and *neither* will ever block.

The salient quote is this, from the random(4) manpage on OS X:
"If the SecurityServer system daemon fails for any reason, output quality will suffer over time without any explicit indication from the random device itself."

That sure sounds like bad quality PRNG random bits to me.  So that's three.

Again: ISTM that the universal expectation is that /dev/urandom will never block.  Therefore os.urandom() should also never block.  That it blocks in 3.5.0 and 3.5.1 is a performance regression and should be fixed.
msg267808 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-08 08:43
Cory, thanks for summing it up for us. I totally agree with you. In my opinion it is troublesome to have different behavior on platforms. We can implement a workaround for Linux, but not for BSD. Or would O_NONBLOCK cause read() to fail with EWOULDBLOCK on /dev/urandom device?

It might be secure enough to use srandom() / random() instead of /dev/urandom in some platforms. It still won't do any good on platforms like Raspberry Pie since the SoC has no RTC. Without a RTC the clock is not set yet. It happens much later in the boot phase when network is available.

I don't see a cross-platform solution that is able to handle this super-special case without opening a potential security issue for the majority of users.
msg267809 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 08:44
To see how long it takes to initialize urandom pool, you can grep your kernel logs. On my physical PC with real hardware, interruptions etc. it's quite fast: 5 seconds.

-- Logs begin at mar. 2016-01-26 07:54:37 CET, ...
...
juin 06 18:34:47 smithers kernel: random: systemd urandom read with 2 bits of entropy available
...
juin 06 18:34:52 smithers kernel: random: nonblocking pool is initialized

I get that the "kernel: random: systemd urandom read with 2 bits of entropy available" message comes from the kernel when systemd reads from /dev/urandom whereas the pool is not initialized yet.
msg267810 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-08 08:48
Larry,  /dev/urandom blocks on BSD when it hasn't been seeded yet. But it looks like we can use sysctl to fetch the seed state from kern.random.sys.seeded.

https://www.freebsd.org/cgi/man.cgi?query=random&sektion=4

     The software generator will start in an unseeded state, and will block
     reads until it is (re)seeded.  This may cause trouble at system boot when
     keys and the like are generated from /dev/random so steps should be taken
     to	ensure a reseed	as soon	as possible.  The sysctl(8) controlling	the
     seeded status (see	below) may be used if security is not an issue or for
     convenience during	setup or development.
msg267811 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-08 08:59
[[ Larry - thanks for the Mercurial pointers. I was starting from the Debian sources because I initially assumed this was a Debian problem. Will switch to hg in future. ]]
msg267812 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 09:01
I don't know if anyone literally still uses BSD.  But on FreeBSD, /dev/urandom can block.

So let me revise my statement slightly.  Developers on platform X know how *their* /dev/urandom behaves.  They should rightly expect that os.urandom() is a thin wrapper around their local /dev/urandom.  If their /dev/urandom doesn't block, then os.urandom() shouldn't block.  If their /dev/urandom blocks, then it's acceptable that their os.urandom() would block.

What I'm trying to avoid here is the surprising situation where someone is using Python on a system where /dev/urandom will never block, and os.urandom() blocks.
msg267813 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-08 09:12
> What I'm trying to avoid here is the surprising situation where someone is using Python on a system where /dev/urandom will never block, and os.urandom() blocks.

At this point I literally do not understand what issue we're trying to solve then.

If the problem is that os.urandom() must behave exactly like /dev/urandom, then sure, that got regressed.

However, I've been talking based on your two previous pronouncements:

> First, my previous statements stand: Python startup must not block.  "import random" must not block.

and

> I am officially making a pronouncement as Release Manager: Python 3.5 *must
> not* take 90 seconds to start up under *any* circumstances.  I view this as
> a performance regression, and it is and will remain a release blocker for
> 3.5.2.
> 
> Python *must not* require special command-line flags to avoid a 90 second
> startup time.  Python *must not* require a special environment-variable to
> avoid a 90 second startup time.  This is no longer open to debate, and I
> will only be overruled by Guido.

Now, if that's the case, then this patch does not fix that problem. It fixes that problem *on Linux*, but not on BSDs.

Perhaps you meant to say that those pronouncements only apply to Linux. That's fine, it's your prerogative. But as written, they don't: they're unconditional. And if they are unconditional, then again I feel like we have to say that /dev/urandom should get *out* of the call path on interpreter startup, because it absolutely can block. And based on Colm's original problem around gathering entropy, which is almost certainly not a Linux-specific concern, I see no reason to believe that this is a hypothetical concern on the BSDs.

So, let me ask a very direct question: does the position about 90s startup apply only to Linux?
msg267815 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 09:21
Hi. I created the issue #27266 "Add block keyword-only optional parameter to os.urandom()" which is compromise between all proposed solutions and should fix *all* urandom issues ;-)

* os.urandom() remains secure by default, as asked by our security experts
* Python startup (hash secret) and "import random" don't block anymore

Happy?
msg267816 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 09:25
You're right, it's remotely possible that on platforms where /dev/urandom could block, Python startup could therefore also block.  And I'm not proposing we fix that, as so far nobody has reported it as a problem.

This suggests to me that yes I'm talking specifically about the regression on Linux in the 3.5 series.

But honestly it's too late for me to say for sure one way or another.  I need to go to bed.

p.s if we have to slip RC1 by a day or two in order to get this settled, that's okay, but hopefully I can keep final on schedule.
msg267817 - (view) Author: Cory Benfield (Lukasa) * Date: 2016-06-08 09:30
> You're right, it's remotely possible that on platforms where /dev/urandom
> could block, Python startup could therefore also block.  And I'm not
> proposing we fix that, as so far nobody has reported it as a problem.
> 
> This suggests to me that yes I'm talking specifically about the regression
> on Linux in the 3.5 series.

Ok, so with that clarification I personally would prefer Victor's patch from #27266, but can also understand wanting to leave the codebase as-is. Either way would be consistent with your goals, Larry. Victor's patch is more secure, but does cause os.urandom to diverge from the semantics of /dev/urandom in extreme conditions (specifically, early boot) on Linux.

That's your tradeoff to make, Larry. =) I think both sides have been well-argued here. Thanks for clarifying.
msg267818 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 09:31
> Hi. I created the issue #27266 "Add block keyword-only optional parameter
> to os.urandom()" which is compromise between all proposed solutions and
> should fix *all* urandom issues ;-)
> 
> * os.urandom() remains secure by default, as asked by our security experts
> * Python startup (hash secret) and "import random" don't block anymore
> 
> Happy?

Probably not.

What is the default value of the "block" parameter?

If called with block=False on FreeBSD, where /dev/urandom may block sometimes, what does the function do?
msg267819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 09:39
no-urandom-by-default.diff uses a very weak source of entropy for random.Random :-( I'm fighting against weak sources of entropy since many years...

This change introduces the bug similar to OpenSSL RAND_bytes() bug (two processes with the same pid can produce the same random sequence): two Python processes started "at the same time" (with a resolution of 1/256 sec ~= 3.9 ms) produces the same random sequence.

With my script:
---
import subprocess, sys
args = [sys.executable, '-S', '-c', 'import random; print([random.randint(0, 999) for _ in range(4)])']
numbers = set()
procs = [subprocess.Popen(args, stdout=subprocess.PIPE) for _ in range(10)]
for proc in procs:
    stdout = proc.communicate()[0]
    numbers.add(stdout.rstrip())
for line in numbers:
    print(line.decode())
print("duplicates", len(procs) - len(numbers))
---

Output:
---
[68, 812, 821, 421]
[732, 506, 562, 439]
[70, 711, 476, 230]
[411, 474, 729, 837]
[530, 161, 699, 521]
[818, 897, 582, 38]
[42, 132, 359, 275]
[630, 863, 370, 288]
[497, 716, 61, 93]
duplicates 1
---
msg267823 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 09:44
Larry Hastings:

> > Hi. I created the issue #27266 "Add block keyword-only optional parameter to os.urandom()" (...)
> > Happy?

> Probably not. (...)

I replied on the issue #27266. Sorry I'm unable to follow this issue, there are too many messages now :-(
msg267825 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-08 09:46
On 2016-06-08 11:39, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> no-urandom-by-default.diff uses a very weak source of entropy for random.Random :-( I'm fighting against weak sources of entropy since many years...

It is totally fine to init the MT of random.random() from a weak entropy
source. Just keep in mind that a Mersenne Twister is not a CPRNG. There
is simply no reason why you want to init a MT from a CPRNG.
msg267831 - (view) Author: Donald Stufft (dstufft) * (Python committer) Date: 2016-06-08 10:23
Larry,

I would greatly prefer it if we would allow os.urandom to block on Linux, at least by default. This will make os.urandom behavior similarly on most modern platforms. The cases where this is going to matter are extreme edge cases, for most users they'll just silently be a bit more secure-- important for a number of use cases of Python (think for instance, if someone has a SSH server written in Twisted that generates it's own host keys, a perfectly reasonable use of os.urandom). We've been telling people that os.urandom is the right source for generating randomness for cryptographic use for ages, and I think it is important to use the tools provided to us by the platform to best satisfy that use case by default-- in this case, getrandom() in blocking mode is the best tool provided by the Linux platform.

People writing Python code cannot expect that os.urandom will not block, because on most platforms it *will* block on intialization. However, the cases where it will block are a fairly small window, so by allowing it to block we're giving a better guarantee for very little downside-- essentially that something early on in the boot process shouldn't call os.urandom(), which is the right behavior on Linux (and any other OS) anyways.

The problem is that the Python interpreter itself (essentially) calls os.urandom() as part of it's start up sequence which makes it unsuitable for use in very early stage boot programs. In the abstract, it's not possible to fix this on every single platform without removing all use of os.urandom from Python start up (which I think would be a bad idea). I think Colm's nonblocking_urandom_noraise.patch is a reasonable trade off (perhaps not the one I would personally make, but I think it's reasonable). If we wish to ensure that Python interpreter start up never blocks on Linux without needing to supply any command line flags or environment variables, then I would strongly urge us to adopt his patch, but allow os.urandom to still block.

In other words, please let's not let systemd's design weaken the security guarantees of os.urandom (generate cryptographically secure random bytes using the best tools provided by the platform). Let's make a targeted fix.
msg267836 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-08 11:34
Even though it may sound like a minor problem that os.urandom()
blocks on startup, I think the problem is getting more and more
something to consider given how systems are used nowadays.

Today, we no longer have the case where you keep a system up
and running for multiple years as we had in the past. VM,
containers and other virtualizations are spun up and down at
a high rate, so the boot cycle becomes more and more important.

FreeBSD, for example, is also concerned about the blocking issue
they have in their implementation:

https://wiki.freebsd.org/201308DevSummit/Security/DevRandom

and they are trying to resolve this by making sure to add as
much entropy they can find very early on in the process.

Now, most applications you run early on in the boot process
are not going to be applications that need crypto random
numbers and this is where I think the problem originates.

We've been telling everybody to use os.urandom() for seeding,
and so everyone uses it, including many many applications that
don't even require crypto random seeding.

The random module is the perfect example.

Essentially, we'd need to educate people that there's a difference
in requesting crypto random data and pseudo random data.

While we can fix the the cases in the stdlib and
the interpreter that don't need crypto random data to use
other means of seeding (e.g. reading straight from /dev/urandom
on Linux or gathering other data to mix into a seed), existing
applications out there will continue to use os.urandom() for
things that don't need crypto random numbers - after all, we told
them to use it.

Some of these will eventually be hit by the blocking problem,
even for applications such as Monte Carlo simulations that
don't need crypto random and should thus not have to wait for
some entropy pool to get initialized.

Now, applications that do need crypto random data should be
able to request this from Python via the stdlib and os.urandom()
may sound like a good basis, but since this is designed as
interface to /dev/urandom, it doesn't block on Linux, so
not such a good choice.

Using /dev/random probably doesn't work either, because this can
block unexpected even after initialization.

IMO, the best way forward and to educate application writers
about the problems is to introduce a two new APIs in 3.6:

os.cyptorandom() for getting OS crypto random data
os.pseudorandom() for getting OS pseudo random data

Crypto applications will then clearly know that
os.cryptorandom() is the right choice for them and
everyone else can use os.pseudorandom().

The APIs would on Linux and other platforms then use getrandom()
with appropriate default settings, i.e. blocking or raising
for os.cryptorandom() and non-blocking, non-raising for
os.pseudorandom().

As for the solving the current issue, we will have to
give people some way to get at non-blocking pseudo random data,
if they need it early in the boot process. With the
proposed change, this is still possible via reading
/dev/urandom directly on Linux, so not everything is
lost.

BTW: Wikipedia has a good overview of how the different
implementations of /dev/random work across platforms:

https://en.wikipedia.org/wiki//dev/random
msg267837 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-06-08 11:49
I'm unsubscribing from this ticket for the second time. This form of discussion is becoming toxic for me because I strongly beliefe that it is not the correct way to handle a security-related problem.
msg267846 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 13:35
Donald: "Cory wasn't speaking about (non)blocking in general, but the case where (apparently) it's desired to not block even if that means you don't get cryptographically secure random in the CPython interpreter start up. (...)"

Oh sorry, I misunderstood his message.
msg267850 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 13:44
Cory Benfield (msg267637): "if the purpose of this patch was to prevent long startup delays, *it failed*. On all the systems above os.urandom may continue to block system startup."

I don't pretend fixing the issue on all operating systems. As stated in the issue title, this issue is specific to Linux. I understand that the bug still exists on other platforms and is likely to require a specific fix for each platform.
msg267853 - (view) Author: Theodore Tso (Theodore Tso) Date: 2016-06-08 13:58
One of the reasons why trying to deal with randomness is hard is because a lot of it is about trust.  Did Intel backdoor RDRAND to help out the NSA?   You might have one answer if you work for the NSA, and perhaps if you are willing to assume the worst about the NSA balancing its equities between its signals intelligence mission and providing a secure infrastructure for its customers and keeping the US computing industry strong.   Etc., etc.

It is true that OS developers are trying to make their random number generators be initialized more quickly at boot time.  Part of this is because of the dynamic which we can all see at work on the discussion of this bug.  Some people care about very much about not blocking; some people want Python to be useful during the boot sequences; some people care very much about security above all else; some people don't trust application programmers.  (And if you fit in that camp; congratulations, now you know how I often feel when I worry about user space programmers doing potentially crazy things and I have no way of even knowing about them until the security researchers publish a web site such as http://www.factorable.net)

From the OS's perspective, one of the problems is that it's very hard to know when you have actually achieved a securely initialized random number generator.  Sure, we can say we've done this once we have accumulated at least 128 bits of entropy, but that begs the question of when you've collected a bit of entropy.  There's no way to know for sure.  On current systems, we assume that each interrupt gathers 1/64th of a bit of entropy on average.  This is an incredibly conservative number, and on real hardware, assuming the normal bootup activity, we achieve that within about 5 seconds (plus/minus 2 seconds) after boot.   On Intel, on real hardware, I'm comfortable cutting this to 1 bit of entropy per interrupt, which will speed up things considerably.  In an ARM SOC, or if you are on a VM and you don't trust the hypervisor so you don't use virtio-rng, is one bit of entropy per interrupt going to be good enough?  It's hard to say.

On the other hand, if we use too conservative a number, there is a risk that userspace programmers (such as some have advocated on the discussionon this bug) to simply always use GRND_NONBLOCK, or fall back to /dev/urandom, and then if there's a security exposure, they'll cast the blame on the OS developers.  The reality is that we really need to work together, because the real problem are the clueless people writing python scripts at boot time to create long-term RSA private keys for IOT devices[1].  :-)    

So when people assert that developers at FreeBSD are at work trying to speedup /dev/random initialization, folks need to understand that there's no magic here.  What's really happening is that we're all trying to figure out which defaults work the best.  In some ways the FreeBSD folks have it easier, because they support a much fewer range of platforms.  It's a lot easier to get things right on x86, where we have instructions like RDTSC and RDRAND to help us out.  It's a lot harder to be sure you have things right for ARM SOC's.   There are other techniques such as trying to carry entropy over from previous boot sessions, but (a) this requires support from the boot loaders, and on an OS with a large number of architectures, that means adding support to a large number of different ways of booting the kernel --- and it doesn't solve the "consumer device generating keys after a cold start when the device is freshly removed from the packaging".

As far as adding knobs, such as "blocking vs non-blocking", etc., keep in mind that as you add knobs, you increase the knowledge of the system that you force onto the next layer of the stack.  So this goes to the question of whether you trust application programmers will be able to get things right.

So Ted, why does Linux expose /dev/random vs /dev/urandom?  Historical reasons; some people don't believe that relying on cryptogaphic random number generators is sufficient, they *want* to use entropy which has minimal reliance on the belief that NSA ***probably*** didn't leave a back door into SHA-1, for example.  It is for that reason that /dev/random exists.  These days, the number of people who believe that to be true are very small, but I didn't want to make changes in existing interfaces.  For similar reasons I didn't want to suddenly make /dev/urandom block.   The fact that getrandom(2) blocks only until the cryptographic RNG has been initialized, and that it depends on a cryptogaphic RNG, is the consensus that *most* people have come to, and it reflects my recommendations that unless you ***really*** know what you are doing, the right thing to do is to call getrandom(2) with the flags field set to zero, and to be happy.   Of course, many people are sure they know what they need to do than there are people who really *do* know what they are doing, which is why in BSD, they simply don't give people a choice with their getentropy(2) system call.  If you assume that application/user-space programmers should never be trusted, and API's should come with a strong point of view, that's a reasonable design choice.   At some level this is the same choice which is before the Python developer community.  I'm not going to presume to tell you what the right thing to do is here, because it's filled with engineering and design tradeoffs.  Hopefully this additional perspective is useful, though.

[1]  This is a joke, folks.  We need to all work together, even the application programmers.  Some may say that means we're doomed from a security perspective, but security really has to be a collective responsibility if we don't want our "home of the future" to be completely pwned by the bad guys.....
msg267855 - (view) Author: Theodore Tso (Theodore Tso) Date: 2016-06-08 14:21
Oh --- and about people wondering whether os.random is being used for cryptographic purposes or not "most of the time" or not --- again, welcome to my world.  I get complaints all the time from people who try to do "dd if=/dev/urandom of=/dev/hdX bs=4k" and then complain this is too slow.

Creating an os.cryptorandom and os.pseudorandom might be a useful way to go here.  I've often considered whether I should create a /dev/frandom for the crazies who want to use dd as a way to wipe a disk, but to date I've haven't thought it was worth the effort, and I didn't want to encourage them.  Besides, isn't obviously the right answer is to create a quickie python script?  :-)

Splitting os.random does beg the question of what os.random should do, however.  If you go down that path, I'd suggest defaulting to the secure-but-slow choice.

I'd also suggest assuming it's OK to put the onus on the people who are trying to run python scripts during early boot to have to either add some command flags to the python interpreter, or to otherwise make adjustments, as being completely fair.  But again, that's my bias, and if people don't want to deal with trying to ask the systemd folks to make a change in their code, I'd _completely_ understand.

My design preference is that outside of boot scripts, having os.random block in the same of security is completely fair, since in that case you won't deadlock the system.  People of good will may disagree, of course, and I'm not on the Python development team, so take that with whatever grain of salt you wish.   At the end of the day, this is all about tradeoffs, and you know your customer/developer base better than I do.\

Cheers!
msg267856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 14:27
>> The current behavior is that Python *will not start at all* if getrandom() blocks (because the hash secret initialization fails).
> It starts jsut fine, it just can possible takes awhile.

In my experience, connecting to a VM using SSH with low entropy can take longer than 1 minute. As an user, I considered that the host was down. Longer than 1 minute is simply too long.

It's unclear to me if getrandom() can succeed (return random bytes) on embedded devices without hardware RNG. Can it take longer than 1 minute?

Is it possible that getrandom() simply blocks forever?
msg267857 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-08 14:30
Victor -

Yes, it is possible for it to block forever - see the test I proposed for Ted upthread. The triggering case (systemd-crontab-generator) delays for 90 seconds, but is *killed* by systemd after that time; it doesn't itself time out.

Colm
msg267863 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-08 15:23
Just to re-state; I think we have three problems:

1) _Py_HashSecret initialization blocking. Affects all Python invocations; already a substantial issue on Debian testing track (90s startup delay).

* there seems to be general agreement that this does not need a 'strong' secret in a script called at/near startup.
* On Linux, getrandom(GRND_NONBLOCK) *or* /dev/urandom are sufficient for this initialization.
* On other OS, we don't have a non-blocking kernel PRNG; this is probably not an issue for Solaris or OS X, and only a possible issue for OpenBSD.
* Is it acceptable to fall back to an in-process seed generation for the cases where initialization via /dev/urandom fails (NB : there have been no reports of this type of failure in the wild).

* existing tip with or without nonblocking_urandom_noraise.patch addresses this for Linux. Solution for other OS remains to be written.
* Possibly can be considered non-blocking for other OS, as there has been no recent regression in behavior.

2) Blocking on 'import random' and/or os.urandom. I don't see a clear consensus on the Right Thing for this case. Existing tip (without nonblocking_urandom_noraise.patch) addresses it for Linux, but solution is not universally accepted. Unclear whether this is a 3.5.2 blocker.

3) Design of future APIs for >= 3.6. The most frequent suggestion is something like os.pseudorandom() (guaranteed nonblocking) and os.cryptorandom() (guaranteed entropy); I guess this needs to go to the dev list for full discussion - is it safely out of scope for this bug?

My suggestion (for what it's worth): accept Victor's changeset plus nonblocking_urandom_noraise for 3.5.2 (I'll submit a proper patch shortly), recommend userspace workarounds for the blocking urandom issue, propose new APIs for 3.6 on the dev list.
msg267873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 16:52
I spent almost my whole day to read this issue, some related issues, and some more related links. WOW! Amazing discussing. Sorry that Christian decided to quit the discussion (twice) :-(

Here is my summary: http://haypo-notes.readthedocs.io/pep_random.html

tl; dr "The issue is to find a solution to not block Python startup on such case, and keep getrandom() enhancement for os.urandom()."

--

Status of Python 3.5.2: http://haypo-notes.readthedocs.io/pep_random.html#status-of-python-3-5-2

My summary: "With the changeset 9de508dc4837: Python doesn’t block at startup anymore (issues #25420 and #26839 are fixed) and os.urandom() is as secure as Python 2.7, Python 3.4 and any application reading /dev/urandom."

=> STOP! don't touch anything, it's now fine ;-) (but maybe follow my link for more information)

--

To *enhance* os.urandom(), always use getrandom() syscall on Linux, I opened the issue #27266. I changed the title to "Always use getrandom() in os.random() on Linux and add block=False parameter to os.urandom()" to make my intent more explicit.

As some of you have already noticed, it's not easy to implement this issue! There are technical issues to implement os.urandom(block=False).

In fact, this issue tries to fix two different but close issues:

(a) Always use getrandom() for os.urandom() on Linux
(b) Implement os.urandom(block=False) on *all* platforms

The requirement for (a) is to not reopen the bug #25420 (block on "import random"). dstufft proposed no-urandom-by-default.diff (attached to this issue), but IMHO it makes the random module worse than before. I proposed (b) as the correct fix. It's a work-in-progress, please come on the issue #27266 to help me!

--

Please contact me if we want to fix/enhance my doc http://haypo-notes.readthedocs.io/pep_random.html

Right now, I'm not interested to convert this summary to a real PEP. It looks like you agree on solutions. We should now invest our time on solutions rather than listing again all issues ;-)

I know that it's really hard, but I suggest to abandon this issue (since, again, it's closed!), and focus on more specific issues and work on fixing issues. No? What do you think?

--

IMHO The problem in this discussion is that it started with a very well defined issue (Python blocks at startup on Debian Testing in a script started by systemd when running in a VM) to a wide discussion about all RNG, all kinds of issues related to RNG and a little bit to security in general.
msg267887 - (view) Author: Martin Pitt (pitti) Date: 2016-06-08 20:35
> you could give some kind of command-line flag

That already exists -- set PYTHONHASHSEED=0.

> But I'll let someone else have the joys of negotiating with Lennart, and I won't blame the Python devs if using GRND_NONBLOCK unconditionally is less painful than having to work with the systemd folks. 

In case it's of any relief: This has nothing to do with having to change anything in systemd itself -- none of the services that systemd ships use Python. The practical case where this bug appeared was cloud-init (which is written in Python), and that wants to run early in the boot sequence even before the network is up (so that tools like "pollinate" which gather entropy from the cloud host don't kick in yet). So if there's any change needed at all, it would be in cloud-init and similar services which run Python during early boot.
msg267890 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-08 20:41
@pitti -

We already discussed this; there are cases where it's not practical to set an environment variable. The discussion eventually converged on "it is not desirable that Python should block on startup, regardless of system RNG status".

Re: the triggering bug; it was actually /lib/systemd/system-generators/systemd-crontab-generator (in systemd-cron) which caused the behavior to be noticed in Debian. It wasn't a change in systemd behavior, per se (that has been a Python script for some time), it was the fact that it was being called before the system PRNG had been initialized. With the change from /dev/urandom to getrandom() in 3.5.1, this caused a deadlock at boot.
msg267893 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-08 20:49
I am increasingly convinced that I'm right.

--

First, consider that the functions in the os module, as a rule, are a thin shell over the equivalent function provided by the operating system.  If Python exposes a function called os.XYZ(), and it calls the OS, then with few exceptions it does so by calling a function called XYZ().**

This has several ramifications, and these are effectively guarantees for the Python programmer:

* You can read your local man pages (or equivalent) to see how the function behaves oh your system.  Python occasionally improves on the functionality provided; os.utime() provides a lot more functionality than POSIX utime.  But it never *degrades* the functionality provided.

* It's implied, and strongly preferred, that the function is atomic: it will make exactly one system call.  I once proposed simulating behavior for an os module function using a series of system calls, and this approach was rejected because it wasn't atomic.  So if you see a function os.XYZ(), you may predict that Python will call XYZ() exactly once, and with only a few exceptions you'll be right.

Now read this snippet of the documentation for os.urandom():

"The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a Unix-like system this will query /dev/urandom, and on Windows it will use CryptGenRandom()."

That text has been in the documentation for os.urandom() since at least Python 2.6.  (That's as old as we have on the web site; I didn't go hunting for older documentation.)

Thus the documentation for os.urandom():

* explicitly says it uses /dev/urandom, and

* explicitly *does not* guarantee cryptographic strength random numbers on all platforms at all times.

Thus, while it's laudable to try and give the user higher-quality random bits when they call os.urandom(), you cannot degrade the behavior of the system's /dev/urandom when doing so.  On Linux /dev/urandom is *guaranteed* to never block.  This guarantee is so strong, Mr. Ts'o had to add a separate facility to Linux (/dev/random) to permit blocking.  os.urandom() *must* replicate this behavior.

What I'm proposing is that os.urandom() may use getrandom(RND_NOBLOCK) to attempt to get higher-quality random bits, but it *must not block*.  If it fails, it will use /dev/urandom, *exactly as it is documented to do*.

(Naturally this flunks the "atomic operation" test.  But in the case of procuring random bits, the atomicity of its operation is obviously irrelevant.)


** The exception to this, naturally, is Windows.  Internally the os module is called "posixmodule"--and this is no coincidence.  AFAIK every platform supported by CPython is POSIX-based except Windows.  The choice was made long ago to simulate POSIX behavior on Windows so as to present a consistent API to the programmer.  If you're curious about this, and have the time, read the implementation of os.stat for Windows.  What a rush!

--

Second, I invoke the "consenting adults" rule.  Python provides well-documented behavior for os.urandom().  You cannot make assumptions about the use case of the caller and decide for them that they would prefer the function block in an unbounded fashion rather than provide low-quality random bits.

And yes, unbounded.  As covered earlier in the thread, it only blocked for 90 seconds before systemd killed it.  We don't know how long it would actually have blocked.  This is completely unacceptable--for startup, for "import random", and for "os.urandom()" on Linux.

--

Third, because the os module is in general a thin wrapper over what the OS provides, I disapprove of "cryptorandom()" and "pseudorandom()" going into the os module.  There are no functions with these names on any OS of which I'm aware.  This is why I proposed "os.getrandom(n, block=True)".  From its signature, the function it calls on your OS will be obvious, and its semantics on your OS will be documented by your OS.

Thus I am completely unwilling to add os.cryptorandom() and os.pseudorandom() in 3.5.2.
msg267897 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-06-08 21:04
On 08.06.2016 22:49, Larry Hastings wrote:
> 
> Third, because the os module is in general a thin wrapper over what the OS provides, I disapprove of "cryptorandom()" and "pseudorandom()" going into the os module.  There are no functions with these names on any OS of which I'm aware.  This is why I proposed "os.getrandom(n, block=True)".  From its signature, the function it calls on your OS will be obvious, and its semantics on your OS will be documented by your OS.
> 
> Thus I am completely unwilling to add os.cryptorandom() and os.pseudorandom() in 3.5.2.

That was a sketch for 3.6 to resolve the ambiguity between the
different use cases.

You're right, it's better to move such things to the random
module.
msg267898 - (view) Author: Colm Buckley (Colm Buckley) * Date: 2016-06-08 21:26
Larry -

Regardless of the behavior of os.urandom (and 'import random'), is it agreed that the current state of _PyRandom_Init is acceptable for 3.5.2?

The current behavior (as of 9de508dc4837) is that it will never block on Linux, but could still block on other OS if called before /dev/urandom is initialized. We have not determined a satisfactory solution for other operating systems. Note that no other OS have reported a problem 'in the wild', probably because of their extreme rarity in VM/container environments and the lack of Python in their early init sequence.

Colm
msg267913 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 22:19
I opened the issue #27272: "random.Random should not read 2500 bytes from urandom".
msg267914 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-08 22:22
> The current behavior (as of 9de508dc4837) is that it will never block on Linux, but could still block on other OS if called before /dev/urandom is initialized.

In practice, only Linux is impacted. See the rationale:
https://haypo-notes.readthedocs.io/pep_random.html#scope-of-the-python-blocks-at-startup-issue


> We have not determined a satisfactory solution for other operating systems.

Stoooop. This issue is specific to Linux. If you want to fix the issue on other operating systems, please open a new issue.

Oh, you know what? I already opened such issue :-) The issue #27266 wants to fix the issue on all platforms, not only Linux. Open a second issue if you prefer.
msg267939 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-09 00:07
> Regardless of the behavior of os.urandom (and 'import random'), is it agreed that the current state of _PyRandom_Init is acceptable for 3.5.2?

I'll get back to you with a specific yes or no.  What I want is that it the behavior removed where "import random" can block unboundedly on Linux because it's waiting for the entropy pool to fill.  If the code behaves like that, then yes, but I'm not giving it my official blessing until I read it.
msg268018 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-09 11:26
I just posted to python-dev and asked Guido to make a BDFL ruling.  I only represented my side, both because I worried I'd do a bad job of representing *cough* literally everybody else *cough*, and because it already took me so long to write the email.  All of you who disagree with me, I'd appreciate it if you'd reply to my python-dev posting and state your point of view.
msg268201 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-06-11 08:46
Colm Buckley: I've read the code, *and* stepped through it, and AFAICT it is no longer even possible for Python on Linux to call getrandom() in a blocking way.  Thanks for doing this!  I'm marking the issue as closed.
msg268591 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-06-14 23:35
One last fix needed to fully revert this is to remove the mention from the Python 3.5 What's New documentation: https://docs.python.org/3.5/whatsnew/3.5.html#os
msg268593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-14 23:40
Nick Coghlan: "One last fix needed to fully revert this is to remove the mention from the Python 3.5 What's New documentation: https://docs.python.org/3.5/whatsnew/3.5.html#os"

This sentence?

"The urandom() function now uses the getrandom() syscall on Linux 3.17 or newer, and getentropy() on OpenBSD 5.6 and newer, removing the need to use /dev/urandom and avoiding failures due to potential file descriptor exhaustion."

Why removing it? It's still correct that getrandom() is used by os.urandom() in the common case.

The corner case (urandom entropy pool not initialized) is already documented (including a "Changed in version 3.5.2: ..."):
https://docs.python.org/3.5/library/os.html#os.urandom

I don't think that it's worth to mention the corner case in What's New in Python 3.5.
msg268627 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-06-15 18:14
Sorry, with all the different proposals kicking around, I somehow got the impression we'd reverted entirely to just reading from /dev/urandom without ever using the new syscall.

Re-reviewing your patch, I agree the What's New comment is still accurate.
msg268629 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-15 20:00
> Re-reviewing your patch, I agree the What's New comment is still accurate.

Thanks for double checking ;-)
History
Date User Action Args
2022-04-11 14:58:30adminsetgithub: 71026
2016-06-15 20:00:44vstinnersetmessages: + msg268629
2016-06-15 18:14:29ncoghlansetmessages: + msg268627
2016-06-14 23:40:51vstinnersetmessages: + msg268593
2016-06-14 23:35:05ncoghlansetmessages: + msg268591
2016-06-11 08:46:07larrysetstatus: open -> closed

messages: + msg268201
stage: patch review -> resolved
2016-06-09 11:26:39larrysetmessages: + msg268018
2016-06-09 00:07:00larrysetmessages: + msg267939
2016-06-08 23:32:05ncoghlansetnosy: + ncoghlan
2016-06-08 22:22:03vstinnersetmessages: + msg267914
2016-06-08 22:19:25vstinnersetmessages: + msg267913
2016-06-08 21:26:27Colm Buckleysetmessages: + msg267898
2016-06-08 21:25:20ppperrysettitle: Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing /arguinthe random module on getrandom() -> Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
2016-06-08 21:04:48lemburgsetmessages: + msg267897
2016-06-08 20:49:31larrysetmessages: + msg267893
2016-06-08 20:41:46Colm Buckleysetmessages: + msg267890
2016-06-08 20:35:23pittisetnosy: + pitti

messages: + msg267887
title: Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom() -> Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing /arguinthe random module on getrandom()
2016-06-08 16:52:10vstinnersetmessages: + msg267873
2016-06-08 15:23:32Colm Buckleysetmessages: + msg267863
2016-06-08 14:30:46Colm Buckleysetmessages: + msg267857
2016-06-08 14:27:06vstinnersetmessages: + msg267856
2016-06-08 14:21:34Theodore Tsosetmessages: + msg267855
2016-06-08 13:58:17Theodore Tsosetmessages: + msg267853
2016-06-08 13:44:38vstinnersetmessages: + msg267850
2016-06-08 13:35:53vstinnersetmessages: + msg267846
2016-06-08 11:50:08christian.heimessetnosy: - christian.heimes
2016-06-08 11:49:55christian.heimessetnosy: lemburg, rhettinger, doko, vstinner, larry, christian.heimes, matejcik, ned.deily, alex, skrah, python-dev, martin.panter, ztane, dstufft, Lukasa, thomas-petazzoni, Colm Buckley, Theodore Tso
messages: + msg267837
2016-06-08 11:34:42lemburgsetmessages: + msg267836
2016-06-08 10:23:58dstufftsetmessages: + msg267831
2016-06-08 09:46:41christian.heimessetmessages: + msg267825
2016-06-08 09:44:12vstinnersetmessages: + msg267823
2016-06-08 09:39:04vstinnersetmessages: + msg267819
2016-06-08 09:31:10larrysetmessages: + msg267818
2016-06-08 09:30:10Lukasasetmessages: + msg267817
2016-06-08 09:25:08larrysetmessages: + msg267816
2016-06-08 09:21:05vstinnersetmessages: + msg267815
2016-06-08 09:12:33Lukasasetmessages: + msg267813
2016-06-08 09:01:24larrysetmessages: + msg267812
2016-06-08 08:59:52Colm Buckleysetmessages: + msg267811
2016-06-08 08:48:10christian.heimessetmessages: + msg267810
2016-06-08 08:44:56vstinnersetmessages: + msg267809
2016-06-08 08:43:00christian.heimessetmessages: + msg267808
2016-06-08 08:38:47larrysetmessages: + msg267807
2016-06-08 08:29:17Lukasasetmessages: + msg267806
2016-06-08 08:19:30larrysetmessages: + msg267805
2016-06-08 08:16:34Lukasasetmessages: + msg267804
2016-06-08 08:04:33larrysetmessages: + msg267803
2016-06-07 23:15:54christian.heimessetnosy: + christian.heimes
messages: + msg267752
2016-06-07 22:45:02Theodore Tsosetmessages: + msg267751
2016-06-07 21:24:48lemburgsetmessages: + msg267750
2016-06-07 21:19:28Colm Buckleysetmessages: + msg267749
2016-06-07 20:34:03dstufftsetmessages: + msg267746
2016-06-07 20:27:03Theodore Tsosetnosy: + Theodore Tso
messages: + msg267745
2016-06-07 20:07:35dstufftsetmessages: + msg267741
2016-06-07 20:06:42christian.heimessetnosy: - christian.heimes
2016-06-07 20:01:29lemburgsetmessages: + msg267740
2016-06-07 19:55:26larrysetmessages: + msg267739
2016-06-07 19:54:29dstufftsetmessages: + msg267737
2016-06-07 19:50:22Colm Buckleysetmessages: + msg267735
2016-06-07 19:12:00Lukasasetmessages: + msg267731
2016-06-07 19:10:18larrysetmessages: + msg267730
2016-06-07 19:03:48Lukasasetmessages: + msg267729
2016-06-07 19:03:05Colm Buckleysetmessages: + msg267728
2016-06-07 19:00:53dstufftsetmessages: + msg267726
2016-06-07 18:55:37Colm Buckleysetmessages: + msg267725
2016-06-07 18:51:52dokosetmessages: + msg267723
2016-06-07 18:46:22larrysetmessages: + msg267721
2016-06-07 18:45:11Colm Buckleysetmessages: + msg267720
2016-06-07 18:40:30dstufftsetmessages: + msg267718
2016-06-07 18:34:05larrysetmessages: + msg267716
2016-06-07 18:29:04dstufftsetfiles: + no-urandom-by-default.diff

messages: + msg267715
2016-06-07 17:59:54christian.heimessetmessages: + msg267712
2016-06-07 17:53:19christian.heimessetmessages: + msg267711
2016-06-07 17:52:01dstufftsetmessages: + msg267710
2016-06-07 17:46:03larrysetmessages: + msg267709
2016-06-07 17:36:53dstufftsetmessages: + msg267707
2016-06-07 17:16:13Colm Buckleysetmessages: + msg267705
2016-06-07 16:04:21Colm Buckleysetmessages: + msg267699
2016-06-07 15:39:12dstufftsetmessages: + msg267696
2016-06-07 15:35:49dstufftsetmessages: + msg267695
2016-06-07 15:32:08larrysetmessages: + msg267694
2016-06-07 15:23:32Colm Buckleysetmessages: + msg267693
2016-06-07 15:10:54Colm Buckleysetmessages: + msg267690
2016-06-07 15:10:02lemburgsetmessages: + msg267689
2016-06-07 14:59:16christian.heimessetmessages: + msg267688
2016-06-07 14:57:51Colm Buckleysetmessages: + msg267687
2016-06-07 14:49:28lemburgsetmessages: + msg267686
2016-06-07 14:47:40christian.heimessetmessages: + msg267685
2016-06-07 14:43:43Colm Buckleysetmessages: + msg267684
2016-06-07 14:19:10skrahsetmessages: + msg267682
2016-06-07 14:18:35dstufftsetmessages: + msg267681
2016-06-07 14:18:09vstinnersetmessages: + msg267680
2016-06-07 14:17:02vstinnersetmessages: + msg267679
2016-06-07 14:14:59Colm Buckleysetfiles: + nonblocking_urandom_noraise.patch

messages: + msg267678
2016-06-07 14:09:44christian.heimessetmessages: + msg267677
2016-06-07 14:09:10dstufftsetmessages: + msg267676
2016-06-07 14:06:05lemburgsetmessages: + msg267675
2016-06-07 13:58:38Colm Buckleysetmessages: + msg267674
2016-06-07 13:51:51dstufftsetmessages: + msg267673
2016-06-07 13:51:40alexsetmessages: + msg267672
2016-06-07 13:49:23Colm Buckleysetmessages: + msg267671
2016-06-07 13:43:55dstufftsetmessages: + msg267670
2016-06-07 13:40:16dstufftsetmessages: + msg267669
2016-06-07 13:37:21christian.heimessetnosy: + christian.heimes
2016-06-07 13:36:37Colm Buckleysetmessages: + msg267668
2016-06-07 13:33:13Colm Buckleysetmessages: + msg267667
2016-06-07 13:21:08dstufftsetmessages: + msg267666
2016-06-07 13:16:35lemburgsetmessages: + msg267665
2016-06-07 13:12:05dstufftsetmessages: + msg267664
2016-06-07 13:07:10dstufftsetmessages: + msg267663
2016-06-07 13:01:44lemburgsetmessages: + msg267661
2016-06-07 12:52:17vstinnersetmessages: + msg267660
2016-06-07 12:40:32dstufftsetmessages: + msg267656
2016-06-07 12:36:59lemburgsetmessages: + msg267654
2016-06-07 12:25:12vstinnersetstatus: closed -> open
resolution: fixed
messages: + msg267650
2016-06-07 12:24:19vstinnersetmessages: + msg267648
2016-06-07 12:19:31dstufftsetmessages: + msg267645
2016-06-07 12:18:18lemburgsetmessages: + msg267644
2016-06-07 12:09:01dstufftsetmessages: + msg267643
2016-06-07 12:06:36vstinnersetresolution: fixed -> (no value)
messages: + msg267642
2016-06-07 12:05:33dstufftsetmessages: + msg267640
2016-06-07 12:05:15alexsetmessages: + msg267638
2016-06-07 12:04:52Lukasasetmessages: + msg267637
2016-06-07 12:02:52vstinnersetmessages: + msg267636
2016-06-07 12:00:01dstufftsetmessages: + msg267635
2016-06-07 11:54:45dstufftsetmessages: + msg267634
2016-06-07 11:53:20vstinnersetmessages: + msg267633
2016-06-07 11:51:54dstufftsetmessages: + msg267632
2016-06-07 11:45:29lemburgsetmessages: + msg267631
2016-06-07 11:41:58thomas-petazzonisetmessages: + msg267630
2016-06-07 11:39:59dstufftsetmessages: + msg267629
2016-06-07 11:36:17dstufftsetnosy: + dstufft
messages: + msg267628
2016-06-07 11:35:49Lukasasetnosy: + Lukasa
messages: + msg267627
2016-06-07 11:34:35vstinnersetmessages: + msg267626
2016-06-07 11:32:00vstinnersetmessages: + msg267625
2016-06-07 11:31:47lemburgsetmessages: + msg267624
2016-06-07 11:27:33alexsetnosy: + alex
messages: + msg267623
2016-06-07 10:44:50skrahsetmessages: + msg267621
2016-06-07 10:15:40vstinnersetmessages: + msg267617
2016-06-07 10:14:36vstinnersetmessages: + msg267616
2016-06-07 10:09:48Colm Buckleysetmessages: + msg267614
2016-06-07 10:01:16vstinnersetmessages: + msg267612
2016-06-07 09:55:53vstinnersetmessages: + msg267611
2016-06-07 09:40:22vstinnersetmessages: + msg267610
2016-06-07 09:39:57vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg267609
2016-06-07 09:27:24python-devsetnosy: + python-dev
messages: + msg267608
2016-06-06 23:32:39Colm Buckleysetfiles: + getrandom_nonblocking_v4.patch

messages: + msg267571
2016-06-06 20:50:09larrysetmessages: + msg267554
2016-06-06 20:45:55Colm Buckleysetfiles: + getrandom-nonblocking-v3.patch

messages: + msg267550
2016-06-06 20:24:44larrysetmessages: + msg267546
2016-06-06 16:13:23skrahsetnosy: + skrah
messages: + msg267539
2016-06-06 15:52:41ztanesetnosy: + ztane
messages: + msg267537
2016-06-06 12:43:03socketpairsetnosy: - socketpair
2016-06-06 02:39:33martin.pantersetmessages: + msg267511
2016-06-06 02:29:41martin.pantersetnosy: + martin.panter
messages: + msg267504
2016-06-05 18:44:14ned.deilysetstage: patch review
2016-06-05 18:43:43ned.deilysetnosy: + larry
2016-06-05 18:43:07ned.deilysetpriority: normal -> release blocker
nosy: + ned.deily
messages: + msg267455

2016-05-24 02:04:21Colm Buckleysetmessages: + msg266216
2016-05-14 23:09:28Colm Buckleysetmessages: + msg265555
2016-05-14 22:49:20vstinnersetmessages: + msg265549
2016-05-14 01:51:30Colm Buckleysetfiles: + getrandom-nonblocking-v2.patch

messages: + msg265500
2016-05-13 23:24:10vstinnersetmessages: + msg265496
2016-05-13 19:40:42skrahsetnosy: - skrah
2016-05-13 19:35:12Colm Buckleysetmessages: + msg265485
2016-05-13 16:31:41Colm Buckleysetmessages: + msg265481
2016-05-13 14:39:59Colm Buckleysettype: behavior
messages: + msg265477
title: Python 3.5 running in a virtual machine with Linux kernel 3.17+ can block at startup or on importing the random module on getrandom() -> Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
2016-05-13 08:18:43Colm Buckleysetfiles: + nonblocking-getrandom.diff
keywords: + patch
messages: + msg265452
2016-05-12 21:48:02Colm Buckleysetmessages: + msg265430
2016-05-12 21:18:22Colm Buckleysetnosy: + Colm Buckley
messages: + msg265427
2016-04-26 15:11:14lemburgsetmessages: + msg264303
2016-04-26 14:31:57skrahsetmessages: + msg264292
2016-04-26 14:14:14vstinnersetmessages: + msg264289
2016-04-26 13:44:11skrahsetmessages: + msg264284
2016-04-26 12:40:38vstinnersettitle: Python 3.5 running in a virtual machine with Linux kernel 3.17+ can block at startup or on importing the random module -> Python 3.5 running in a virtual machine with Linux kernel 3.17+ can block at startup or on importing the random module on getrandom()
2016-04-26 12:40:18vstinnersettitle: Python 3.5 running in a virtual machine blocks at startup or on importing the random module -> Python 3.5 running in a virtual machine with Linux kernel 3.17+ can block at startup or on importing the random module
2016-04-26 12:38:40vstinnersettitle: python always calls getrandom() at start, causing long hang after boot -> Python 3.5 running in a virtual machine blocks at startup or on importing the random module
2016-04-26 12:37:32vstinnersetmessages: + msg264271
2016-04-26 12:35:11skrahsetmessages: + msg264270
2016-04-26 12:22:23skrahsetnosy: + skrah
messages: + msg264267
2016-04-26 12:12:26vstinnersetnosy: + lemburg, rhettinger, matejcik, socketpair, thomas-petazzoni
2016-04-26 12:11:47vstinnersetmessages: + msg264265
2016-04-26 12:11:44vstinnerlinkissue25420 superseder
2016-04-26 11:47:19vstinnersetmessages: + msg264258
2016-04-24 19:37:03vstinnersetmessages: + msg264126
2016-04-24 19:06:20dokosetmessages: + msg264122
2016-04-24 19:04:14dokocreate