Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random._randbelow optimization #77325

Closed
wm75 mannequin opened this issue Mar 26, 2018 · 16 comments
Closed

random._randbelow optimization #77325

wm75 mannequin opened this issue Mar 26, 2018 · 16 comments
Assignees
Labels
3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@wm75
Copy link
Mannequin

wm75 mannequin commented Mar 26, 2018

BPO 33144
Nosy @tim-one, @rhettinger, @mdickinson, @serhiy-storchaka, @wm75
PRs
  • bpo-33144: random.Random and subclasses: split _randbelow implementation #6291
  • bpo-33144: Fix choosing random.Random._randbelow implementation. #6563
  • Files
  • randbelow.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2018-05-08.12:47:14.274>
    created_at = <Date 2018-03-26.15:00:28.215>
    labels = ['3.8', 'library', 'performance']
    title = 'random._randbelow optimization'
    updated_at = <Date 2018-05-08.12:47:14.273>
    user = 'https://github.com/wm75'

    bugs.python.org fields:

    activity = <Date 2018-05-08.12:47:14.273>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2018-05-08.12:47:14.274>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2018-03-26.15:00:28.215>
    creator = 'wolma'
    dependencies = []
    files = ['47501']
    hgrepos = []
    issue_num = 33144
    keywords = ['patch']
    message_count = 16.0
    messages = ['314455', '314489', '314494', '314496', '314498', '314502', '314534', '314536', '314537', '314601', '314602', '314788', '315397', '315398', '315570', '316286']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'serhiy.storchaka', 'wolma']
    pr_nums = ['6291', '6563']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue33144'
    versions = ['Python 3.8']

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Mar 26, 2018

    Given that the random module goes a long way to ensure optimal performance, I was wondering why the check for a match between the random and getrandbits methods is performed per call of Random._randbelow, when it could also be done at instantiation time (the attached patch uses __init_subclass__ for that purpose and, in my hands, gives 10-25% speedups for calls to methods relying on _randbelow).
    Is it really necessary to guard against someone monkey patching the methods rather than using inheritance?

    @wm75 wm75 mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 26, 2018
    @rhettinger
    Copy link
    Contributor

    it could also be done at instantiation time (the attached patch
    uses __init_subclass__ for that purpose

    FWIW, a 10-25% speedup is only possible because the remaining code is already somewhat fast. All that is being proposed is removing couple of lines that elsewhere would be considered somewhat thin:

            random = self.random
            if type(random) is BuiltinMethod \
               or type(getrandbits) is Method:

    Overall, the idea of doing the check only once at instantiation time seems promising. That said, I have unspecific general worries about using __init_subclass__ and patching the subclass. Perhaps Serhiy, Tim, or Mark will have thoughts on whether this sort of self-patching is something we want to be doing in the standard library, whether it would benefit PyPy, and whether it has risks to existing code, to debugging and testing, and to future maintenance.

    If I were the one to go the route of making a single pre-check, my instinct would be to just set a flag in __init__, so that the above code would simplify to:

            if self._valid_getrandbits:
                ...

    @rhettinger rhettinger self-assigned this Mar 27, 2018
    @tim-one
    Copy link
    Member

    tim-one commented Mar 27, 2018

    I don't see anything objectionable about the class optimizing the implementation of a private method.

    I'll note that there's a speed benefit beyond just removing the two type checks in the common case: the optimized _randbelow() also avoids populating its locals with 5 unused formal arguments (which are just "a trick" to change what would otherwise have been global accesses into local accesses). So it actually makes the method implementation cleaner & clearer too.

    But it's really the speed that matters here.

    @rhettinger
    Copy link
    Contributor

    the optimized _randbelow() also avoids populating its locals
    with 5 unused formal arguments

    Yes, that clean-up would be nice as well :-)

    Any thoughts on having __init__ set a flag versus using __init__subclass__ to backpatch the subclass? To me, the former looks like plain python and latter doesn't seem like something that would normally be done in the standard library.

    @tim-one
    Copy link
    Member

    tim-one commented Mar 27, 2018

    I'm the wrong guy to ask about that. Since I worked at Zope Corp, my natural inclination is to monkey-patch everything - but knowing full well that will offend everyone else ;-)

    That said, this optimization seems straightforward to me: two distinct method implementations for two very different approaches that have nothing in common besides the method name & signature.

    @serhiy-storchaka
    Copy link
    Member

    I think this is excellent application of __init_subclass__. It is common to patch an instance method in __init__, but this can create a reference loop if patch it by other instance method. In this case the choice doesn't depend on arguments of __init__, and can be done at class creation time.

    I like the idea in general, but have comments about the implementation.

    init_subclass should take **kwargs and pass it to super().init_subclass(). type(cls.random) is not the same as type(self.random). I would use the condition cls.random is _random.Random.random instead, or check if the method is in cls.dict.

    This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this.

    We could support also the following cases:

    1. class Rand1(Random):
      def random(self): ...
      # _randbelow should use random()
        class Rand2(Rand1):
            def getrandbits(self): ...
            # _randbelow should use getrandbits()
            # this is broken in the current patch
    1. class Rand1(Random):
      def getrandbits(self): ...
      # _randbelow should use getrandbits()
        class Rand2(Rand1):
            def random(self): ...
            # _randbelow should use random()
            # this is broken in the current code

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Mar 27, 2018

    Serhiy:

    I like the idea in general, but have comments about the implementation.

    init_subclass should take **kwargs and pass it to super().init_subclass(). type(cls.random) is not the same as type(self.random). I would use the condition cls.random is _random.Random.random instead, or check if the method is in cls.dict.

    This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this.

    My bad, sorry, and thanks for catching all these issues!

    You're absolutely right about the class type checks not being equivalent
    to the original ones at the instance level.
    Actually, this is due to the fact that I first moved the checks out of
    _randbelow and into init just as Raymond would have done and tested
    this, but then I realized that init_subclass looked just like the
    right place and moved them again - this time without testing on derived
    classes again.
    From a quick experiment it looks like types.MethodDescriptorType would
    be the correct type to check cls.random against and types.FunctionType
    would need to be checked against cls.getrandbits, but that starts to
    look rather esoteric to me - so you are probably right that something
    with a cls.dict check or the alternative suggestion of cls.random is _random.Random.random are better solutions, indeed.

    We could support also the following cases:

    1. class Rand1(Random):
      def random(self): ...
      # _randbelow should use random()

      class Rand2(Rand1):
      def getrandbits(self): ...
      # _randbelow should use getrandbits()
      # this is broken in the current patch

    Right, hadn't thought of this situation.

    1. class Rand1(Random):
      def getrandbits(self): ...
      # _randbelow should use getrandbits()

      class Rand2(Rand1):
      def random(self): ...
      # _randbelow should use random()
      # this is broken in the current code

    May be worth fixing, too.

    @rhettinger
    Copy link
    Contributor

    Wolfgang, can you submit this as a PR.

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Mar 27, 2018

    Thanks, Raymond. I'll do that once I've addressed Serhiy's points.

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Mar 28, 2018

    So, the PR implements the behaviour suggested by Serhiy as his cases 1 and 2.
    Case 2 changes *existing* behaviour because before it was sufficient to have a user-defined getrandbits anywhere in the inheritance tree, while with the PR it has to be more recent (or be defined within the same class) as the random method.
    I'm not 100% sold on this particular aspect so if you think the old behaviour is better, then that's fine with me. In most real situations it would not make a difference anyway (or do people build complex inheritance hierarchies on top of random.Random?).

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Mar 28, 2018

    In addition, I took the opportunity to fix a bug in the original _randbelow in that it would only raise the advertised ValueError on n=0 in the getrandbits-dependent branch, but ZeroDivisionError in the pure random branch.

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Apr 1, 2018

    ok, I've created bpo-33203 to deal with raising ValueError in _randbelow consistently.

    @rhettinger
    Copy link
    Contributor

    New changeset ba3a87a by Raymond Hettinger (Wolfgang Maier) in branch 'master':
    bpo-33144: random.Random and subclasses: split _randbelow implementation (GH-6291)
    ba3a87a

    @rhettinger
    Copy link
    Contributor

    Possibly, the switch from type checks to identity checks could be considered a bugfix that could be backported. I've always had a lingering worry about that part of the code.

    @serhiy-storchaka
    Copy link
    Member

    PR 6291 didn't work properly with case 1. Rand2 uses getrandbits() since it is overridden in the parent despites the fact that random() is defined later.

    PR 6563 fixes this. It walks classes in method resolution order and finds the first class that defines random() or getrandbits().

    PR 6563 also makes tests not using logging for testing purpose.

    @serhiy-storchaka
    Copy link
    Member

    New changeset ec1622d by Serhiy Storchaka in branch 'master':
    bpo-33144: Fix choosing random.Random._randbelow implementation. (GH-6563)
    ec1622d

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants