classification
Title: enum.Flag instance creation is slow
Type: Stage: patch review
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Antony.Lee, ethan.furman, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-09-06 12:18 by Antony.Lee, last changed 2019-11-26 22:36 by ethan.furman.

Pull Requests
URL Status Linked Edit
PR 16291 closed python-dev, 2019-09-19 17:53
PR 16483 merged hongweipeng, 2019-09-30 04:25
Messages (4)
msg351244 - (view) Author: Antony Lee (Antony.Lee) * Date: 2019-09-06 12:18
Consider the following example

    from enum import Flag
    F = Flag("F", list("abcdefghijklm"))
    for idx in range(2**len(F) - 1):
        F(idx)

creating all possible combos of 13 flags, so 8192 instances (yes, I know the instances are cached later, but the initial cost is still significant).  Locally, this takes over 4.5s to execute; profiling shows that ~30% of that time is spent executing enum._is_power_of_two(!):

    def _power_of_two(value):
        if value < 1:
            return False
        return value == 2 ** _high_bit(value)

Indeed, replacing the implementation of _power_of_two by

    import math
    _powers_of_two = {
        2 ** i for i in range(math.ceil(math.log2(sys.maxsize)) + 1)}
    def _power_of_two(value):
        return (value in _powers_of_two if value <= sys.maxsize
                else value == 2 ** _high_bit(value))

shaves off ~30% of the runtime (obviously this won't help for huge (>sys.maxsize) flag values, but these are rarer I would think).

An even better fix, I think, would be for Flag to cache `flags_to_check` (updating it at the same time as `_value2member_map_`) instead of recomputing it each time in _decompose

    flags_to_check = [
            (m, v)
            for v, m in list(flag._value2member_map_.items())
            if m.name is not None or _power_of_two(v)
            ]

as this actually needlessly introduces quadratic complexity when iterating over all possible combos (because the size of _value2member_map_ is growing).  (Also, it's not actually clear to me how one can end up with unnamed power-of-two instances in _value2member_map?)
msg351246 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-06 13:08
The fast method to check if the value is a power of two:

    not value & (value - 1)
msg351265 - (view) Author: Antony Lee (Antony.Lee) * Date: 2019-09-06 18:08
Actually, microoptimizing _power_of_two is a red herring and the problem is the quadratic complexity of _decompose (_value2member_map_ becomes bigger and bigger at each iteration).  Replacing the implementation of _decompose by the following fixes the performance issue (the original snippet executes in ~150ms, which may be even more optimizable but already a 30x improvement :)), and still passes test_enum.py (from cpython 3.7.4):


    def _decompose(flag, value):
        """Extract all members from the value."""
        # _decompose is only called if the value is not named
        not_covered = value
        members = []
        for member in flag:
            member_value = member.value
            if member_value and member_value & value == member_value:
                members.append(member)
                not_covered &= ~member_value
        if issubclass(flag, IntFlag) and value > 0:
            while not_covered:
                new_value = 2 ** _high_bit(not_covered)
                if new_value not in flag._value2member_map_:
                    # construct singleton pseudo-members
                    pseudo_member = int.__new__(flag, value)
                    pseudo_member._name_ = None
                    pseudo_member._value_ = value
                    flag._value2member_map_.setdefault(value, pseudo_member)
                members.append(flag(new_value))
                not_covered &= ~new_value
        if not members and value in flag._value2member_map_:
            members.append(flag._value2member_map_[value])
        members.sort(key=lambda m: m._value_, reverse=True)
        if len(members) > 1 and members[0].value == value:
            # we have the breakdown, don't need the value member itself
            members.pop(0)
        return members, not_covered


Checking for issubclass(flag, IntFlag) is a bit ugly but that's because Flag and IntFlag really need two separate implementations of _decompose -- the former wants to error out on uncovered values, the latter should create them on-the-fly.
msg357535 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2019-11-26 22:36
New changeset 0b41a922f95f62b620d5a197b9f2ed8e4bb70730 by Ethan Furman (HongWeipeng) in branch 'master':
bpo-38045: Improve the performance of _decompose() in enum.py (GH-16483)
https://github.com/python/cpython/commit/0b41a922f95f62b620d5a197b9f2ed8e4bb70730
History
Date User Action Args
2019-11-26 22:36:06ethan.furmansetmessages: + msg357535
2019-09-30 04:25:33hongweipengsetpull_requests: + pull_request16068
2019-09-21 14:12:22xtreaksetnosy: + ethan.furman
2019-09-19 17:53:12python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request15876
2019-09-06 18:08:17Antony.Leesetmessages: + msg351265
2019-09-06 13:20:41vstinnersettitle: Flag instance creation is slow -> enum.Flag instance creation is slow
2019-09-06 13:08:46serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg351246
2019-09-06 12:18:00Antony.Leecreate