classification
Title: random._randbelow optimization
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: mark.dickinson, rhettinger, serhiy.storchaka, tim.peters, wolma
Priority: normal Keywords: patch

Created on 2018-03-26 15:00 by wolma, last changed 2018-05-08 12:47 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
randbelow.patch wolma, 2018-03-26 15:00
Pull Requests
URL Status Linked Edit
PR 6291 merged wolma, 2018-03-28 14:49
PR 6563 merged serhiy.storchaka, 2018-04-21 14:35
Messages (16)
msg314455 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-03-26 15:00
Given that the random module goes a long way to ensure optimal performance, I was wondering why the check for a match between the random and getrandbits methods is performed per call of Random._randbelow, when it could also be done at instantiation time (the attached patch uses __init_subclass__ for that purpose and, in my hands, gives 10-25% speedups for calls to methods relying on _randbelow).
Is it really necessary to guard against someone monkey patching the methods rather than using inheritance?
msg314489 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-03-27 01:25
> it could also be done at instantiation time (the attached patch
> uses __init_subclass__ for that purpose 

FWIW, a 10-25% speedup is only possible because the remaining code is already somewhat fast.  All that is being proposed is removing couple of lines that elsewhere would be considered somewhat thin:

        random = self.random
        if type(random) is BuiltinMethod \
           or type(getrandbits) is Method:

Overall, the idea of doing the check only once at instantiation time seems promising.  That said, I have unspecific general worries about using __init_subclass__ and patching the subclass.  Perhaps Serhiy, Tim, or Mark will have thoughts on whether this sort of self-patching is something we want to be doing in the standard library, whether it would benefit PyPy, and whether it has risks to existing code, to debugging and testing, and to future maintenance.

If I were the one to go the route of making a single pre-check, my instinct would be to just set a flag in __init__, so that the above code would simplify to:

        if self._valid_getrandbits:
            ...
msg314494 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-03-27 03:10
I don't see anything objectionable about the class optimizing the implementation of a private method.

I'll note that there's a speed benefit beyond just removing the two type checks in the common case:  the optimized `_randbelow()` also avoids populating its locals with 5 unused formal arguments (which are just "a trick" to change what would otherwise have been global accesses into local accesses).  So it actually makes the method implementation cleaner & clearer too.

But it's really the speed that matters here.
msg314496 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-03-27 03:30
> the optimized `_randbelow()` also avoids populating its locals 
> with 5 unused formal arguments

Yes, that clean-up would be nice as well :-)

Any thoughts on having __init__ set a flag versus using __init__subclass__ to backpatch the subclass?  To me, the former looks like plain python and latter doesn't seem like something that would normally be done in the standard library.
msg314498 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-03-27 03:41
I'm the wrong guy to ask about that.  Since I worked at Zope Corp, my natural inclination is to monkey-patch everything - but knowing full well that will offend everyone else ;-)

That said, this optimization seems straightforward to me:  two distinct method implementations for two very different approaches that have nothing in common besides the method name & signature.
msg314502 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-27 07:04
I think this is excellent application of __init_subclass__. It is common to patch an instance method in __init__, but this can create a reference loop if patch it by other instance method. In this case the choice doesn't depend on arguments of __init__, and can be done at class creation time.

I like the idea in general, but have comments about the implementation.

__init_subclass__ should take **kwargs and pass it to super().__init_subclass__(). type(cls.random) is not the same as type(self.random). I would use the condition `cls.random is _random.Random.random` instead, or check if the method is in cls.__dict__.

This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this.

We could support also the following cases:

1.
    class Rand1(Random):
        def random(self): ...
        # _randbelow should use random()

    class Rand2(Rand1):
        def getrandbits(self): ...
        # _randbelow should use getrandbits()
        # this is broken in the current patch

2.
    class Rand1(Random):
        def getrandbits(self): ...
        # _randbelow should use getrandbits()

    class Rand2(Rand1):
        def random(self): ...
        # _randbelow should use random()
        # this is broken in the current code
msg314534 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-03-27 15:34
Serhiy:

> I like the idea in general, but have comments about the implementation.
> 
> __init_subclass__ should take **kwargs and pass it to super().__init_subclass__(). type(cls.random) is not the same as type(self.random). I would use the condition `cls.random is _random.Random.random` instead, or check if the method is in cls.__dict__.
> 
> This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this.
> 

My bad, sorry, and thanks for catching all these issues!

You're absolutely right about the class type checks not being equivalent 
to the original ones at the instance level.
Actually, this is due to the fact that I first moved the checks out of 
_randbelow and into __init__ just as Raymond would have done and tested 
this, but then I realized that __init_subclass__ looked just like the 
right place and moved them again - this time without testing on derived 
classes again.
 From a quick experiment it looks like types.MethodDescriptorType would 
be the correct type to check cls.random against and types.FunctionType 
would need to be checked against cls.getrandbits, but that starts to 
look rather esoteric to me - so you are probably right that something 
with a cls.__dict__ check or the alternative suggestion of `cls.random 
is _random.Random.random` are better solutions, indeed.

> We could support also the following cases:
> 
> 1.
>      class Rand1(Random):
>          def random(self): ...
>          # _randbelow should use random()
> 
>      class Rand2(Rand1):
>          def getrandbits(self): ...
>          # _randbelow should use getrandbits()
>          # this is broken in the current patch
> 

Right, hadn't thought of this situation.

> 2.
>      class Rand1(Random):
>          def getrandbits(self): ...
>          # _randbelow should use getrandbits()
> 
>      class Rand2(Rand1):
>          def random(self): ...
>          # _randbelow should use random()
>          # this is broken in the current code
> 

May be worth fixing, too.
msg314536 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-03-27 15:41
Wolfgang, can you submit this as a PR.
msg314537 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-03-27 15:46
Thanks, Raymond. I'll do that once I've addressed Serhiy's points.
msg314601 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-03-28 14:57
So, the PR implements the behaviour suggested by Serhiy as his cases 1 and 2.
Case 2 changes *existing* behaviour because before it was sufficient to have a user-defined getrandbits anywhere in the inheritance tree, while with the PR it has to be more recent (or be defined within the same class) as the random method.
I'm not 100% sold on this particular aspect so if you think the old behaviour is better, then that's fine with me. In most real situations it would not make a difference anyway (or do people build complex inheritance hierarchies on top of random.Random?).
msg314602 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-03-28 15:01
In addition, I took the opportunity to fix a bug in the original _randbelow in that it would only raise the advertised ValueError on n=0 in the getrandbits-dependent branch, but ZeroDivisionError in the pure random branch.
msg314788 - (view) Author: Wolfgang Maier (wolma) * Date: 2018-04-01 20:16
ok, I've created issue 33203 to deal with raising ValueError in _randbelow consistently.
msg315397 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-17 15:16
New changeset ba3a87aca37cec5b1ee32cf68f4a254fa0bb2bec by Raymond Hettinger (Wolfgang Maier) in branch 'master':
bpo-33144: random.Random and subclasses: split _randbelow implementation (GH-6291)
https://github.com/python/cpython/commit/ba3a87aca37cec5b1ee32cf68f4a254fa0bb2bec
msg315398 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-17 15:19
Possibly, the switch from type checks to identity checks could be considered a bugfix that could be backported.  I've always had a lingering worry about that part of the code.
msg315570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-21 14:45
PR 6291 didn't work properly with case 1. Rand2 uses getrandbits() since it is overridden in the parent despites the fact that random() is defined later.

PR 6563 fixes this. It walks classes in method resolution order and finds the first class that defines random() or getrandbits().

PR 6563 also makes tests not using logging for testing purpose.
msg316286 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-08 12:45
New changeset ec1622d56c80d15740f7f8459c9a79fd55f5d3c7 by Serhiy Storchaka in branch 'master':
bpo-33144: Fix choosing random.Random._randbelow implementation. (GH-6563)
https://github.com/python/cpython/commit/ec1622d56c80d15740f7f8459c9a79fd55f5d3c7
History
Date User Action Args
2018-05-08 12:47:14serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-05-08 12:45:19serhiy.storchakasetmessages: + msg316286
2018-04-21 14:48:34rhettingersetassignee: rhettinger -> serhiy.storchaka
2018-04-21 14:45:25serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg315570

stage: resolved -> patch review
2018-04-21 14:35:54serhiy.storchakasetpull_requests: + pull_request6258
2018-04-20 20:25:14rhettingersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-04-17 15:19:35rhettingersetmessages: + msg315398
2018-04-17 15:16:20rhettingersetmessages: + msg315397
2018-04-01 20:16:38wolmasetmessages: + msg314788
2018-03-28 15:01:18wolmasetmessages: + msg314602
2018-03-28 14:57:43wolmasetmessages: + msg314601
2018-03-28 14:49:06wolmasetstage: patch review
pull_requests: + pull_request6015
2018-03-27 15:46:52wolmasetmessages: + msg314537
2018-03-27 15:41:21rhettingersetmessages: + msg314536
2018-03-27 15:34:59wolmasetmessages: + msg314534
2018-03-27 07:04:37serhiy.storchakasetmessages: + msg314502
2018-03-27 03:41:05tim.peterssetmessages: + msg314498
2018-03-27 03:30:38rhettingersetmessages: + msg314496
2018-03-27 03:10:18tim.peterssetmessages: + msg314494
2018-03-27 01:25:45rhettingersetassignee: rhettinger

messages: + msg314489
nosy: + tim.peters, mark.dickinson, serhiy.storchaka
2018-03-26 15:00:28wolmacreate