Issue43040
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-01-27 13:07 by abo, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 24354 | closed | abo, 2021-01-27 14:26 |
Messages (10) | |||
---|---|---|---|
msg385769 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-27 13:07 | |
I encountered a very significant slowdown migrating some code from python2.7 to python3.9 that I tracked down to randrange() being slow. After digging I noticed that _randbelow_with_getrandbits() calls getrandbits() 2x as many times, and asks for one more bit each time, than is actually required if the range requested is a power of 2. I have a GitHub PR on the way to fix this... |
|||
msg385772 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2021-01-27 13:34 | |
See also #37000. |
|||
msg385777 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-27 14:33 | |
Thanks @mark.dickinson for the heads up on that issue. Comments in the code hinted that people had tried to tackle this but IMHO they missed the real fix. I've submitted #24354 on github. It's missing an addition to the NEWS.d because I wasn't sure if something this small required it. I've signed the CLA and it's just waiting to go through. |
|||
msg385780 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2021-01-27 14:59 | |
@Donovan: Please get Raymond Hettinger's sign-off before merging anything. In #37000, the decision not to change things wasn't because we missed a fix, but rather because it was decided that it was better to leave things as they were. I don't think anything's changed since then. In particular, changes that affect reproducibility shouldn't be taken lightly, and this is such a change. You mention a "very significant slowdown". Do you have numbers for the pre-PR and post-PR performance? |
|||
msg385784 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-27 15:42 | |
I first noticed the problem when migrating a program doing lots of randrange(2**32) calls from python2 (using pypy -O) to python3 (using pypy3 -O) on Debian. The time results were; pypy -O real 3m58.621s user 3m58.501s sys 0m0.085s pypy3 -O real 19m57.337s user 19m57.011s sys 0m0.131s So 5x slower. The execution times for python2 and python3 were too long for me to wait for (I'm guessing 30x those numbers). I realize pypy and pypy3 are not the same as cpython, but they use the same python random.py module, and after fiddling around with profiling it running under all of pypy, pypy3, python2, and python3 it became apparent that pypy and pypy3 was effectively reducing my program to mostly random() (python2) or getrandombits() (python3) calls from under randrange(), and python3 (and pypy3) was calling it 2x as many times. The general overheads of python2 and python3 made the speed differences harder to notice, as the main bottleneck shifted from getrandombits() to general-python-interpreting, but they were still there. So I could get a decent comparison between python2 and python3 without waiting till my retirement, I changed my program to do a fraction of the total work and timed them again, getting; $ time pypy -O ./chunker.py chunker real 0m2.315s user 0m2.246s sys 0m0.074s $ time pypy3 -O ./chunker.py chunker real 0m12.922s user 0m12.850s sys 0m0.073s $ time python2 -O ./chunker.py chunker real 0m59.631s user 0m59.620s sys 0m0.018s $ time python3 -O ./chunker.py chunker real 1m2.588s user 1m2.536s sys 0m0.037s Some of the speed difference seems to be a bit of python2's random() is a little faster than python3's getrandombits(), and maybe python2 int vs python3 longint speed differences, but the 2x calls seemed to be the main killer. I also stumbled onto several blogs talking about Python's random number generation being slow, including the following where I first spotted the problem; https://eli.thegreenplace.net/2018/slow-and-fast-methods-for-generating-random-integers-in-python/ So it seems other people have noticed this is slow too. After reading this blog I switched to just calling getrandbits(32) directly and the timings went to; $ time pypy -O ./chunker.py chunker real 0m4.164s user 0m4.121s sys 0m0.049s $ time pypy3 -O ./chunker.py chunker real 0m4.786s user 0m4.714s sys 0m0.076s $ time python2 -O ./chunker.py chunker real 0m44.869s user 0m44.826s sys 0m0.044s $ time python3 -O ./chunker.py chunker real 0m44.018s user 0m43.998s sys 0m0.019s So changing from randrange(2**32) to getrandbits(32) made pypy 0.55x as fast (random() vs getrandbits() under the hood), pypy3 2.7x faster, python2 1.3x faster and python3 1.4x faster. Some of that is bypassing the call-layers between getrandbits() and randrange(), and profiling tells me the bit_length() call it skips was also pretty expensive, but the 2x getrandbits() bit was definitely most of it. It is interesting random() used by python2 is a bit cheaper than getrandbits() too. Perhaps the default should be _randbelow_without_getrandbits()? I guess it has more limitations on range etc. |
|||
msg385815 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-27 22:48 | |
FTR, more articles about the slowness of generating random integers in Python; https://lemire.me/blog/2016/03/21/ranged-random-number-generation-is-slow-in-python/ https://www.reddit.com/r/Python/comments/jn0bb/randomrandint_vs_randomrandom_why_is_one_15x/ |
|||
msg385820 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-27 23:24 | |
I had a look at the C implementation of getrandbits() and spotted this; https://github.com/python/cpython/blob/64fc105b2d2faaeadd1026d2417b83915af6622f/Modules/_randommodule.c#L487 In my case I was also being hit by randrange(2**32) calling getrandbits(33) just pushing it past using the optimized fast-path for ranges needing <=32 bits. So not only was it calling getrandbits() 2x as many times as necessary, under the hood it was generating 64 random bits each time using a slow path. |
|||
msg385839 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2021-01-28 05:57 | |
Donovan, if the speed of the power of two case is important to you, I recommend using getrandbits() instead. It is ten times faster. $ python3.9 -m timeit -r 11 -s 'from random import randrange, getrandbits' 'randrange(2**32)' 1000000 loops, best of 11: 362 nsec per loop $ python3.9 -m timeit -r 11 -s 'from random import randrange, getrandbits' 'getrandbits(32)' 10000000 loops, best of 11: 32 nsec per loop The two referenced articles aren't pertinent to this issue, the power of two case for randrange(). Both articles boil down to finding that a single task, arity zero, C coded method is significantly faster than a pure python method with error checking, a complex signature, support for arbitrarily large integers, and extra effort to assure an equal distribution. It is unsurprising random() is faster than randrange(). Code that does less work is faster than code that does more work. Mark is correct that the current code wasn't an accident and that we intensionally chose to keep reproducibility. I appreciate your suggestion, but it essentially same one that was rejected in bpo-37000. As Mark pointed out, nothing has changed since then (though some other changes have been added to 3.10 that improved the speed of rangerange() without affecting reproducibility). Note, even if the PR had been accepted, you would still be much better off with getrandbits(). |
|||
msg385846 - (view) | Author: Donovan Baarda (abo) * | Date: 2021-01-28 06:41 | |
Some points to note; I first noticed this as an apparently 5x performance regression for randrange() between pypy and pypy3. This seemed pretty significant, but admittedly the difference is largely masked by other python overheads when comparing python2 and python3. When I mentioned random() is faster, I meant that python2's implementation of randrange() that uses random() under the hood was noticeably faster than python3's randrange() that uses getrandbits() under the hood. I know getrandbits() is significantly faster than randrange(), but randrange() doesn't have to be that bad. However, I agree that changing repeatability is potentially a big issue. |
|||
msg385874 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2021-01-28 18:32 | |
> python2's implementation of randrange() that uses random() > under the hood was noticeably faster than python3's > randrange() that uses getrandbits() under the hood. Yes, that was a conscious decision. See https://bugs.python.org/issue9025 . We traded performance in order to gain correctness. As noted in the docs: "Changed in version 3.2: randrange() is more sophisticated about producing equally distributed values. Formerly it used a style like int(random()*n) which could produce slightly uneven distributions." |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:40 | admin | set | github: 87206 |
2021-01-28 18:32:25 | rhettinger | set | messages: + msg385874 |
2021-01-28 06:41:37 | abo | set | messages: + msg385846 |
2021-01-28 06:07:28 | rhettinger | set | assignee: rhettinger |
2021-01-28 05:57:52 | rhettinger | set | status: open -> closed superseder: _randbelow_with_getrandbits function inefficient with powers of two messages: + msg385839 resolution: duplicate stage: patch review -> resolved |
2021-01-27 23:24:41 | abo | set | messages: + msg385820 |
2021-01-27 22:48:33 | abo | set | messages: + msg385815 |
2021-01-27 15:42:48 | abo | set | messages: + msg385784 |
2021-01-27 14:59:20 | mark.dickinson | set | messages: + msg385780 |
2021-01-27 14:33:56 | abo | set | messages: + msg385777 |
2021-01-27 14:26:45 | abo | set | keywords:
+ patch stage: patch review pull_requests: + pull_request23174 |
2021-01-27 13:34:07 | mark.dickinson | set | messages: + msg385772 |
2021-01-27 13:14:10 | serhiy.storchaka | set | nosy:
+ rhettinger, mark.dickinson |
2021-01-27 13:07:13 | abo | create |