This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: import hashlib makes many programs slow
Type: Stage: resolved
Components: Library (Lib), SSL Versions: Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: christian.heimes Nosy List: bmwiedemann, christian.heimes, vstinner
Priority: normal Keywords:

Created on 2017-05-05 03:23 by bmwiedemann, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (13)
msg293039 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-05 03:23
Steps to Reproduce:
echo import hashlib > test.py
time python -m cProfile -s tottime test.py 2>&1 | head

Actual Results:
shows 27ms spent in hashlib.py


The problem goes away when dropping everything after line 133
in hashlib.py

see also issue 21288
msg293073 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-05 08:16
traced it down a bit further.
nearly all that time is spent in
import _hashlib
which probably means, it is spent in functions like PyInit__hashlib in Modules/_hashopenssl.c

I can see in strace that after loading libssl, libcrypto and libz,
it calls getpid and opens /dev/urandom 8 times
and most of the time is spent in the middle of these:

grep open...dev.uran strace.out
1493970277.542629 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.542847 open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC) = 4
1493970277.543370 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.543967 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.592232 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.592518 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.597778 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
1493970277.598629 open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 4
msg293087 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-05-05 09:42
I cannot reproduce the open /dev/urandom calls locally. You seem to be running an old Python version and/or old Kernel. Python prefers getrandom() over /dev/urandom. The urandom code caches the fd for /dev/urandom.

$ strace ./python -c 'import hashlib' 2>&1 | grep random
getrandom("\xe0\xa5\x00\xda\xe1\xce\x1a\x9c\xe7\x61\x71\xf2\x1d\x15\x63\x19\xdc\x60\x70\x3c\xef\xf6\xcb\xdb", 24, GRND_NONBLOCK) = 24
open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3

The O_RDONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC comes from OpenSSL.

What's your Python version, OpenSSL version, Kernel version and which platform/distribution are you on?
msg293090 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 10:00
What is your Python version? It looks like an old version.
msg293091 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 10:01
What is your OS? Which kind of computer is it? I'm surprised that you noticed open calls on /dev/urandom, it should be very quick, no?
msg293092 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-05-05 10:12
I rather suspect it's OpenSSL. It takes a bit to load and initialize libcrypto.so. On my system it takes about 17ms to import _hashlib. There is nothing we can do about it in CPython.
msg293102 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-05 12:18
I'm running openSUSE Tumbleweed with python-2.7.13 and python3-3.6.1
on an Intel Pentium N3530 with linux-4.10.12
There the total time of the "import _hashlib" oneliner varies between 100 and 250 ms (python3 is a bit slower and the machine is slower when running on battery)

the urandom calls themselves are not slow, but probably the processing in between. Can't that be done on demand on first use?

This is (one minor part of) slowing down python CLI tools like OpenStack's
cinder help
msg293103 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 12:51
> the urandom calls themselves are not slow, but probably the processing in between. Can't that be done on demand on first use?

If the time is spent on loading OpenSSL, maybe you should try to
import hashlib on demand?
msg293114 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-05 15:57
This might work in some places, but if you look at real-world users like
urllib3/util/ssl_.py
> from hashlib import md5, sha1, sha256

or
twisted/words/protocols/jabber/xmlstream.py
> from hashlib import sha1

It would probably not need any openssl at all, but still gets the full penalty time from the import.
So you get all or nothing.
It would be nice to just get access to those few hashes you need for your protocal (because you certainly do not want to reimplement it).
msg293117 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-05-05 16:08
If import of hashlib is slowed down by OpenSSL, then import of ssl module will be even slower.

Am 5. Mai 2017 17:57:41 MESZ schrieb "Bernhard M. Wiedemann" <report@bugs.python.org>:
>
>Bernhard M. Wiedemann added the comment:
>
>This might work in some places, but if you look at real-world users
>like
>urllib3/util/ssl_.py
>> from hashlib import md5, sha1, sha256
>
>or
>twisted/words/protocols/jabber/xmlstream.py
>> from hashlib import sha1
>
>It would probably not need any openssl at all, but still gets the full
>penalty time from the import.
>So you get all or nothing.
>It would be nice to just get access to those few hashes you need for
>your protocal (because you certainly do not want to reimplement it).
>
>----------
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><http://bugs.python.org/issue30276>
>_______________________________________
msg293123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 16:42
I ran a quick benchmark on my laptop, Fedora 25:

haypo@selma$ python3 -m perf command -o ref.json -- python3 -c 'pass' && python3 -m perf command -o hashlib.json -- python3 -c 'import hashlib' && python3 -m perf compare_to ref.json hashlib.json  --table

+-----------+---------+------------------------------+
| Benchmark | ref     | hashlib                      |
+===========+=========+==============================+
| command   | 26.7 ms | 30.2 ms: 1.13x slower (+13%) |
+-----------+---------+------------------------------+

Importing hashlib only takes 3.5 ms on my laptop.
msg293129 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-05 17:18
getting to a similar size on Debian, so I'll move this into the openSUSE bugtracker to find out why it is so slow there.
msg293555 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-12 16:17
Tracking this in https://bugzilla.opensuse.org/show_bug.cgi?id=1038906
now.
It turned out to be a problem with (our?) openssl-1.0.2
which Debian stable does not have yet
and Debian/9 (stretch) has openssl-1.1.0 without this problem

The nicest python-based reproducer so far is
time python -S -c '' ; time python -S -c 'import _hashlib'
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74462
2017-05-12 16:17:46bmwiedemannsetmessages: + msg293555
2017-05-05 17:18:32bmwiedemannsetstatus: open -> closed
resolution: third party
messages: + msg293129

stage: resolved
2017-05-05 16:42:42vstinnersetmessages: + msg293123
2017-05-05 16:08:52christian.heimessetmessages: + msg293117
2017-05-05 15:57:41bmwiedemannsetmessages: + msg293114
2017-05-05 12:51:07vstinnersetmessages: + msg293103
2017-05-05 12:18:11bmwiedemannsetmessages: + msg293102
2017-05-05 10:12:54christian.heimessetmessages: + msg293092
2017-05-05 10:01:53vstinnersetmessages: + msg293091
2017-05-05 10:00:45vstinnersetmessages: + msg293090
2017-05-05 09:42:29christian.heimessetnosy: + vstinner
messages: + msg293087
2017-05-05 08:16:13bmwiedemannsetmessages: + msg293073
2017-05-05 03:23:43bmwiedemanncreate