classification
Title: "import random" should import hashlib on demand (nor load OpenSSL)
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, christian.heimes, inada.naoki, rhettinger, vstinner, xdegaye
Priority: normal Keywords: patch

Created on 2019-04-08 13:11 by vstinner, last changed 2019-04-10 20:18 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12728 closed vstinner, 2019-04-08 13:13
PR 12742 merged christian.heimes, 2019-04-09 10:04
Messages (15)
msg339637 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-08 13:11
Currently, when the random module is imported, the hashlib module is always imported which loads the OpenSSL library, whereas hashlib is only needed when a Random() instance is created with a string seed.

For example, "rnd = random.Random()" and "rnd = random.Random(12345)" don't need hashlib.

Example on Linux:

$ python3
Python 3.7.2 (default, Mar 21 2019, 10:09:12) 
>>> import os, sys

>>> 'hashlib' in sys.modules
False
>>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")

>>> import random
>>> 'hashlib' in sys.modules
True
>>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")
7f463ec38000-7f463ec55000 r--p 00000000 00:2a 5791335                    /usr/lib64/libssl.so.1.1.1b
7f463ec55000-7f463eca5000 r-xp 0001d000 00:2a 5791335                    /usr/lib64/libssl.so.1.1.1b
...


Attached PR only imports hashlib on demand.

Note: I noticed this issue while working on adding OpenSSL 1.1.1 support to Python 3.4 :-)
msg339638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-08 13:16
In the past, some developers complained when an import has been removed in a stdlib module. I vaguely recall code using "import os" to get the "errno" module from "os.errno". That's "What's New in Python 3.7" contains:

"Several undocumented internal imports were removed. One example is that os.errno is no longer available; use import errno directly instead. Note that such undocumented internal imports may be removed any time without notice, even in micro version releases."

For this reason, I don't think that stable versions (2.7 and 3.7) should be modified.
msg339652 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-08 15:35
Why do we care about this particular import?  It doesn't see slow in any way.  In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable).  Otherwise, it is a false optimization.
msg339666 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-04-08 19:36
Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much?
msg339673 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-08 20:32
Raymond:
> In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable).

While I was working on adding OpenSSL 1.1.1 to Python 3.4, my _hashopenssl module was broken. In that case, "import random" fails with ImportError because of hashlib failures. I was surprised, since random doesn't need hashlib at startup.

I would like to be able to use "import random" even if hashlib is broken. For me, random is a key component, whereas I see hashlib more as optional. (Even if in practice, it should always be available).


Raymond:
> Otherwise, it is a false optimization.

Brett:
> Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much?

Well, OpenSSL is not a random tiny library. For example, loading it increases Python RSS of around 2.7 MiB.

Example with script x.py:
---
import os
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
import random
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
---

Output without the change on Fedora 29:

VmRSS:	    7396 kB
VmRSS:	   11796 kB  # +4.4 MiB

With the change:

VmRSS:	    7272 kB
VmRSS:	    8988 kB  # +1.7 MiB


Another example is that OpenSSL loads the libz.so dynamic library by dependency.

I would prefer to minimize Python footprint if possible.
msg339674 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-08 20:48
Raymond:
> Why do we care about this particular import?  It doesn't see slow in any way.

I ran a benchmark:

$ ./python -m perf command -o ref.json -v -- ./python -c 'import random'
$ # apply patch
$ ./python -m perf command -o patch.json -v -- ./python -c 'import random'
$ ./python -m perf compare_to ref.json patch.json 
Mean +- std dev: [ref] 13.8 ms +- 0.2 ms -> [patch] 11.8 ms +- 0.2 ms: 1.18x faster (-15%)

My PR 12728 makes Python startup 2 ms faster which I consider as quite significant on 13.8 ms.

I know that some users are fighting to get a faster Python startup. Mercurial is just one example ;-)
msg339689 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-09 01:19
In general, deferred imports are code smell that should avoided unless really necessary. They create an on-going maintenance burden (there's a reason most modules don't do this and put their imports at the top).

FWIW, a broken hashlib is a localized bug, not an optimization problem. It doesn't affect any user with a build that passes the test suite.

Running "python -v" shows that "random" is not part of the normal startup, so deferring the import saves zero for normal startup. It only affects modules that specifically import random.

IIRC, Mercurial uses hashing extensively, so deferring the import doesn't help them at all.

This is minor change, so I suppose we could let it go through; however, it seems somewhat arbitrary and the reasons offered seem dubious. For the most part, it isn't a good practice.
msg339690 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-09 01:20
One other thought. This code has been present for over a decade.  There is no evidence that anyone has ever wanted random to defer one of its imports. This seems like an invented problem.
msg339720 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-04-09 09:45
You could also use the internal _sha512 module. It's always present, small, lean and provides a SHA512 implementation with sufficient performance.
msg339759 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2019-04-09 15:05
> You could also use the internal _sha512 module. It's always present

This is not true at the moment, the _sha512 module is not present when openssl is missing. This is a bug in setup.py that prevents building the _sha512 module when openssl is missing. See issue 36544.

It is possible that this issue (since it started because of hashlib failures) and also issue 36414 are caused by this problem.
msg339763 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-04-09 15:21
Thanks for pointing to the other issue. This is clearly a regression and should be fixed ASAP. I have ACKed your PR and pushed it.
msg339791 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-09 17:28
Note: *Technically*, you can disable the compilation of the _sha512 module using "*disabled*" in Modules/Setup, but I'm not sure if it's a common use case. At least, it makes sense to me when we are sure that OpenSSL and _hashlib are available ;-) I didn't want to mention that since I'm not sure that it's really relevant in this discussion. (I was aware of the regression, and hopefully it's now fixed!)
msg339822 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-10 01:24
> You could also use the internal _sha512 module. 
> It's always present, small, lean and provides a SHA512
> implementation with sufficient performance.

I suppose we could do this but it borders on telling folks that we're worried about using our own public APIs, that importing hashlib is bad for them.  It shouldn't be that way, hashlib is a collection of hash functions -- it is clear why this import isn't small and fast.  It suggests that something is wrong with the implementation.

The focus on client code in random seems like the wrong focus.  That is just typical of what other clients would do.
msg339834 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-10 07:57
* Is hashlib large, slow to import library? -- Yes.
* random module use hashlib only for specific (rare) use case? -- Yes.
* Does user of random module needs hashlib in most case? -- No.  For example, tmpfile user may not need hashlib.

I'm +1 on PR 12728.
msg339887 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-10 20:18
New changeset d914596a671c4b0f13641359cf43aa0d6fc05070 by Raymond Hettinger (Christian Heimes) in branch 'master':
bpo-36559: random module: optimize sha512 import (GH-12742)
https://github.com/python/cpython/commit/d914596a671c4b0f13641359cf43aa0d6fc05070
History
Date User Action Args
2019-04-10 20:18:43rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-04-10 20:18:23rhettingersetmessages: + msg339887
2019-04-10 07:57:29inada.naokisetnosy: + inada.naoki
messages: + msg339834
2019-04-10 01:24:01rhettingersetmessages: + msg339822
2019-04-09 17:28:40vstinnersetmessages: + msg339791
2019-04-09 15:21:11christian.heimessetmessages: + msg339763
2019-04-09 15:05:26xdegayesetnosy: + xdegaye
messages: + msg339759
2019-04-09 10:04:32christian.heimessetpull_requests: + pull_request12665
2019-04-09 09:45:08christian.heimessetnosy: + christian.heimes
messages: + msg339720
2019-04-09 01:20:56rhettingersetmessages: + msg339690
2019-04-09 01:19:13rhettingersetmessages: + msg339689
2019-04-08 20:48:32vstinnersetmessages: + msg339674
2019-04-08 20:32:55vstinnersetmessages: + msg339673
2019-04-08 19:36:48brett.cannonsetnosy: + brett.cannon
messages: + msg339666
2019-04-08 15:35:08rhettingersetnosy: + rhettinger
messages: + msg339652
2019-04-08 13:16:37vstinnersetmessages: + msg339638
2019-04-08 13:13:37vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12651
2019-04-08 13:11:01vstinnercreate