Message 385784 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	abo
Recipients	abo, mark.dickinson, rhettinger
Date	2021-01-27.15:42:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1611762169.0.0.14277838671.issue43040@roundup.psfhosted.org>
In-reply-to

Content
I first noticed the problem when migrating a program doing lots of randrange(232) calls from python2 (using pypy -O) to python3 (using pypy3 -O) on Debian. The time results were; pypy -O real 3m58.621s user 3m58.501s sys 0m0.085s pypy3 -O real 19m57.337s user 19m57.011s sys 0m0.131s So 5x slower. The execution times for python2 and python3 were too long for me to wait for (I'm guessing 30x those numbers). I realize pypy and pypy3 are not the same as cpython, but they use the same python random.py module, and after fiddling around with profiling it running under all of pypy, pypy3, python2, and python3 it became apparent that pypy and pypy3 was effectively reducing my program to mostly random() (python2) or getrandombits() (python3) calls from under randrange(), and python3 (and pypy3) was calling it 2x as many times. The general overheads of python2 and python3 made the speed differences harder to notice, as the main bottleneck shifted from getrandombits() to general-python-interpreting, but they were still there. So I could get a decent comparison between python2 and python3 without waiting till my retirement, I changed my program to do a fraction of the total work and timed them again, getting; $ time pypy -O ./chunker.py chunker real 0m2.315s user 0m2.246s sys 0m0.074s $ time pypy3 -O ./chunker.py chunker real 0m12.922s user 0m12.850s sys 0m0.073s $ time python2 -O ./chunker.py chunker real 0m59.631s user 0m59.620s sys 0m0.018s $ time python3 -O ./chunker.py chunker real 1m2.588s user 1m2.536s sys 0m0.037s Some of the speed difference seems to be a bit of python2's random() is a little faster than python3's getrandombits(), and maybe python2 int vs python3 longint speed differences, but the 2x calls seemed to be the main killer. I also stumbled onto several blogs talking about Python's random number generation being slow, including the following where I first spotted the problem; https://eli.thegreenplace.net/2018/slow-and-fast-methods-for-generating-random-integers-in-python/ So it seems other people have noticed this is slow too. After reading this blog I switched to just calling getrandbits(32) directly and the timings went to; $ time pypy -O ./chunker.py chunker real 0m4.164s user 0m4.121s sys 0m0.049s $ time pypy3 -O ./chunker.py chunker real 0m4.786s user 0m4.714s sys 0m0.076s $ time python2 -O ./chunker.py chunker real 0m44.869s user 0m44.826s sys 0m0.044s $ time python3 -O ./chunker.py chunker real 0m44.018s user 0m43.998s sys 0m0.019s So changing from randrange(232) to getrandbits(32) made pypy 0.55x as fast (random() vs getrandbits() under the hood), pypy3 2.7x faster, python2 1.3x faster and python3 1.4x faster. Some of that is bypassing the call-layers between getrandbits() and randrange(), and profiling tells me the bit_length() call it skips was also pretty expensive, but the 2x getrandbits() bit was definitely most of it. It is interesting random() used by python2 is a bit cheaper than getrandbits() too. Perhaps the default should be _randbelow_without_getrandbits()? I guess it has more limitations on range etc.

I first noticed the problem when migrating a program doing lots of randrange(2**32) calls from python2 (using pypy -O) to python3 (using pypy3 -O) on Debian. The time results were;


pypy -O
real    3m58.621s
user    3m58.501s
sys     0m0.085s

pypy3 -O
real    19m57.337s
user    19m57.011s
sys     0m0.131s

So 5x slower. The execution times for python2 and python3 were too long for me to wait for (I'm guessing 30x those numbers). I realize pypy and pypy3 are not the same as cpython, but they use the same python random.py module, and after fiddling around with profiling it running under all of pypy, pypy3, python2, and python3 it became apparent that pypy and pypy3 was effectively reducing my program to mostly random() (python2) or getrandombits() (python3) calls from under randrange(), and python3 (and pypy3) was calling it 2x as many times. The general overheads of python2 and python3 made the speed differences harder to notice, as the main bottleneck shifted from getrandombits() to general-python-interpreting, but they were still there. So I could get a decent comparison between python2 and python3 without waiting till my retirement, I changed my program to do a fraction of the total work and timed them again, getting;

$ time pypy -O ./chunker.py chunker
real    0m2.315s
user    0m2.246s
sys     0m0.074s

$ time pypy3 -O ./chunker.py chunker
real    0m12.922s
user    0m12.850s
sys     0m0.073s

$ time python2 -O ./chunker.py chunker
real    0m59.631s
user    0m59.620s
sys     0m0.018s

$ time python3 -O ./chunker.py chunker
real    1m2.588s
user    1m2.536s
sys     0m0.037s

Some of the speed difference seems to be a bit of python2's random() is a little faster than python3's getrandombits(), and maybe python2 int vs python3 longint speed differences, but the 2x calls seemed to be the main killer.

I also stumbled onto several blogs talking about Python's random number generation being slow, including the following where I first spotted the problem;

https://eli.thegreenplace.net/2018/slow-and-fast-methods-for-generating-random-integers-in-python/

So it seems other people have noticed this is slow too. After reading this blog I switched to just calling getrandbits(32) directly and the timings went to;

$ time pypy -O ./chunker.py chunker
real    0m4.164s
user    0m4.121s
sys     0m0.049s

$ time pypy3 -O ./chunker.py chunker
real    0m4.786s
user    0m4.714s
sys     0m0.076s

$ time python2 -O ./chunker.py chunker
real    0m44.869s
user    0m44.826s
sys     0m0.044s

$ time python3 -O ./chunker.py chunker
real    0m44.018s
user    0m43.998s
sys     0m0.019s

So changing from randrange(2**32) to getrandbits(32) made pypy 0.55x as fast (random() vs getrandbits() under the hood), pypy3 2.7x faster, python2 1.3x faster and python3 1.4x faster.

Some of that is bypassing the call-layers between getrandbits() and randrange(), and profiling tells me the bit_length() call it skips was also pretty expensive, but the 2x getrandbits() bit was definitely most of it.

It is interesting random() used by python2 is a bit cheaper than getrandbits() too. Perhaps the default should be _randbelow_without_getrandbits()? I guess it has more limitations on range etc.

History
Date	User	Action	Args
2021-01-27 15:42:49	abo	set	recipients: + abo, rhettinger, mark.dickinson
2021-01-27 15:42:48	abo	set	messageid: <1611762169.0.0.14277838671.issue43040@roundup.psfhosted.org>
2021-01-27 15:42:48	abo	link	issue43040 messages
2021-01-27 15:42:48	abo	create