Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enum classes cause slow startup time #82840

Closed
gpshead opened this issue Nov 1, 2019 · 23 comments
Closed

enum classes cause slow startup time #82840

gpshead opened this issue Nov 1, 2019 · 23 comments
Assignees
Labels
3.10 only security fixes performance Performance or resource usage release-blocker stdlib Python modules in the Lib dir

Comments

@gpshead
Copy link
Member

gpshead commented Nov 1, 2019

BPO 38659
Nosy @arigo, @gpshead, @vstinner, @ethanfurman, @ammaraskar, @pablogsal
PRs
  • bpo-38659: [Enum] add _simple_enum decorator #25285
  • Revert "bpo-38659: [Enum] add _simple_enum decorator" #25476
  • bpo-38659: [Enum] add _simple_enum decorator #25497
  • bpo-38659: Properly re-intialize module variables in test_enum #25516
  • bpo-38659: Revert "[Enum] add _simple_enum decorator (GH-25497)" #25560
  • bpo-38659: [Enum] do not check '_inverted_' during _test_simple_enum #25566
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2021-04-24.21:43:38.457>
    created_at = <Date 2019-11-01.01:12:48.657>
    labels = ['release-blocker', 'library', '3.10', 'performance']
    title = 'enum classes cause slow startup time'
    updated_at = <Date 2021-10-12.19:31:41.821>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2021-10-12.19:31:41.821>
    actor = 'ethan.furman'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2021-04-24.21:43:38.457>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2019-11-01.01:12:48.657>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38659
    keywords = ['patch']
    message_count = 23.0
    messages = ['355777', '355815', '391420', '391421', '391532', '391565', '391567', '391568', '391569', '391570', '391571', '391604', '391621', '391678', '391679', '391711', '391713', '391716', '391753', '391787', '391807', '403517', '403753']
    nosy_count = 6.0
    nosy_names = ['arigo', 'gregory.p.smith', 'vstinner', 'ethan.furman', 'ammar2', 'pablogsal']
    pr_nums = ['25285', '25476', '25497', '25516', '25560', '25566']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue38659'
    versions = ['Python 3.10']

    @gpshead
    Copy link
    Member Author

    gpshead commented Nov 1, 2019

    Creating an enum subclass (ie: defining an enum) is slow. This dramatically impacts startup time of Python programs that import a bunch of potentially needed constant definitions at startup before any proper code executes.

    How slow? So slow that a module defining a ~300 enums takes nearly 100ms just to import from its pyc file.

    Example code: https://github.com/googleads/google-ads-python/blob/96fd08bb62435f1930df4871033ba8689333b67f/google/ads/google_ads/v2/services/enums.py

    We've known this, we should do something about it. (Even if it means implementing the guts of the magic enum machinery in C.) ie, it came up in https://bugs.python.org/issue28637 as a stdlib startup time regression and is likely to come up in similar contexts elsewhere.

    @gpshead gpshead added 3.9 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Nov 1, 2019
    @ethanfurman
    Copy link
    Member

    I was just looking at this problem, and creating a bare-bones, no safety belts version for use in the stdlib (no patch yet) which decreases Enum creation from 14x slower to only 6x slower. (Comparing to a class with simple attributes.)

    Not sure if that's enough improvement, though.

    If it needs to be even faster, a C version of that simplified Enum shouldn't be too hard. Anyone that uses the _simple_enum, though, should have a test that uses the full Enum and compares the two to make sure nothing got lost in translation.

    @ethanfurman ethanfurman self-assigned this Nov 1, 2019
    @ethanfurman
    Copy link
    Member

    New changeset dbac8f4 by Ethan Furman in branch 'master':
    bpo-38659: [Enum] add _simple_enum decorator (GH-25285)
    dbac8f4

    @ethanfurman
    Copy link
    Member

    New changeset 503cdc7 by Ethan Furman in branch 'master':
    Revert "bpo-38659: [Enum] add _simple_enum decorator (GH-25285)" (GH-25476)
    503cdc7

    @ethanfurman ethanfurman added 3.10 only security fixes and removed 3.9 only security fixes labels Apr 21, 2021
    @ethanfurman
    Copy link
    Member

    New changeset a02cb47 by Ethan Furman in branch 'master':
    bpo-38659: [Enum] add _simple_enum decorator (GH-25497)
    a02cb47

    @pablogsal
    Copy link
    Member

    Commit a02cb47 fails to build in the refleak buildbots:

    https://buildbot.python.org/all/#/builders/75/builds/2/steps/5/logs/stdio

    Example failure:

    ❯ ./python  -m test test_enum -R :
    0:00:00 load avg: 1.81 Run tests sequentially
    0:00:00 load avg: 1.81 [1/1] test_enum
    beginning 9 repetitions
    123456789
    .test test_enum failed -- Traceback (most recent call last):
      File "/home/pablogsal/github/python/master/Lib/test/test_enum.py", line 3700, in test_convert_repr_and_str
        self.assertEqual(format(test_type.CONVERT_STRING_TEST_NAME_A), '5')
    AssertionError: 'CONVERT_STRING_TEST_NAME_A' != '5'
    - CONVERT_STRING_TEST_NAME_A
    + 5

    test_enum failed

    == Tests result: FAILURE ==

    1 test failed:
    test_enum

    Total duration: 586 ms
    Tests result: FAILURE

    @pablogsal
    Copy link
    Member

    Can someone take a look?

    As per the buildbot policy (https://discuss.python.org/t/policy-to-revert-commits-on-buildbot-failure/404) we may need to revert it if is not fixed in 24 because of the risk of masking errors.

    @ammaraskar
    Copy link
    Member

    Looks like this is the issue described in the comment here: https://github.com/python/cpython/blob/master/Lib/test/test_enum.py#L3691-L3692

    On the first run you have the correct

    ('CONVERT_STRING_TEST_NAME_A', 5)

    but later it turns into

    ('CONVERT_STRING_TEST_NAME_A', test.test_enum.CONVERT_STRING_TEST_NAME_A)

    causing double-conversions of the enum elements. This causes the format(x) test to fail. You can re-create the same issue outside of the refleak by adding a simple:

        def test_convert_repr_and_str_again(self):
            self.test_convert_repr_and_str()

    method.

    @ethanfurman
    Copy link
    Member

    format issue resolved, but not the refleak issue.

    @ethanfurman
    Copy link
    Member

    New changeset 37b173c by Ammar Askar in branch 'master':
    bpo-38659: Properly re-intialize module variables in test_enum (GH-25516)
    37b173c

    @ethanfurman
    Copy link
    Member

    Actually, I think that fixed the refleak issue as well.

    Thanks, Ammar!

    @vstinner
    Copy link
    Member

    def setUp(self):
            # Reset the module-level test variables to their original integer
            # values, otherwise the already created enum values get converted
            # instead.

    Why not doing that in a tearDown() method instead? What if you run explicitly a single test method?

    @pablogsal
    Copy link
    Member

    Thanks Ammar for the fix. Unfortunately there is still some failures related to this on test_socker:

    ======================================================================
    ERROR: test_msgflag_enum (test.test_socket.GeneralModuleTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-ppc64le.refleak/build/Lib/test/test_socket.py", line 1969, in test_msgflag_enum
        enum._test_simple_enum(CheckedMsgFlag, socket.MsgFlag)
      File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-ppc64le.refleak/build/Lib/enum.py", line 1664, in _test_simple_enum
        raise TypeError('enum mismatch:\n   %s' % '\n   '.join(failed))
    TypeError: enum mismatch:
       'MSG_TRUNC' member mismatch:
          extra key '_inverted_' in simple enum member 'MSG_TRUNC'
       'MSG_CTRUNC' member mismatch:
          extra key '_inverted_' in simple enum member 'MSG_CTRUNC'

    @pablogsal
    Copy link
    Member

    Gentle ping

    @pablogsal
    Copy link
    Member

    Commenting out the enum tests reveals that test_socket has additional problems:

    𓋹 ./python.exe -m test test_socket -R 3:3
    0:00:00 load avg: 2.32 Run tests sequentially
    0:00:00 load avg: 2.32 [1/1] test_socket
    beginning 6 repetitions
    123456
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    ./Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    ./Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    ./Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    ./Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    ./Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2701: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg(bufsize, *args)
    /Users/pgalindo3/github/python/master/Lib/test/test_socket.py:2792: RuntimeWarning: received malformed or improperly-truncated ancillary data
    result = sock.recvmsg_into([buf], *args)
    .
    test_socket leaked [20, 20, 20] file descriptors, sum=60
    test_socket failed in 2 min 25 sec

    == Tests result: FAILURE ==

    1 test failed:
    test_socket

    Total duration: 2 min 25 sec
    Tests result: FAILURE

    @pablogsal
    Copy link
    Member

    Unfortunately, I am being forced to revert commit a02cb47 due to have it failing of all refleak buildbots for more than two days.

    @ethanfurman
    Copy link
    Member

    My apologies, I was having hardware issues.

    Checking it out now.

    @pablogsal
    Copy link
    Member

    Thanks a lot Ethan. I will wait then for the investigation.

    @ethanfurman
    Copy link
    Member

    New changeset 6c681e1 by Ethan Furman in branch 'master':
    bpo-38659: [Enum] do not check '_inverted_' during simple test (GH-25566)
    6c681e1

    @ethanfurman
    Copy link
    Member

    Pablo, did my latest patch resolved the errors?

    @pablogsal
    Copy link
    Member

    Seems that the buildbots are going back to green so I will close the revert PR. THanks a lot, Ethan for the fix and the investigation!

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Oct 9, 2021

    Nobody seemed to mention it so I might as well: defining a regular Enum class takes an amount of time that is clearly quadratic in the number of attributes. That means that the problem is not Python-versus-C or small speed-ups or adding secret APIs to do the simple case faster. The problem is in the algorithm which needs to be fixed somewhere. My timings:

    number of attributes time
    1500 0.24s
    3000 0.94s
    6000 3.74s
    12000 15.57s

    @ethanfurman
    Copy link
    Member

    The reason for that quadratic behavior is that for each new member (aka attribute), all the previous members must be checked to see if the new member is a duplicate.

    In practice I wouldn't expect this to be a problem as most enums should be fairly small -- are there any real-world examples where there are more than, say, a hundred?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes performance Performance or resource usage release-blocker stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants