classification
Title: Enum member lookup is 20x slower than normal class attribute lookup
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ethan.furman Nosy List: barry, craigh, eli.bendersky, ethan.furman, larry, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-02-19 21:59 by craigh, last changed 2015-03-11 15:45 by ethan.furman. This issue is now closed.

Files
File name Uploaded Description Edit
test.py craigh, 2015-02-19 21:59
issue23486.stoneleaf.01.patch ethan.furman, 2015-02-20 09:29 review
Messages (18)
msg236234 - (view) Author: Craig Holmquist (craigh) Date: 2015-02-19 21:59
Running the attached test script:

$ time python test.py enum

real	0m6.546s
user	0m6.530s
sys	0m0.007s
$ time python test.py int

real	0m0.384s
user	0m0.377s
sys	0m0.000s

I encountered this with a script that yielded a sequence of objects (potentially a few hundred thousand of them) and categorized them with instances of an Enum subclass.  The consumer of that iteration processes each object with a switch-case-like comparison of the category, checking it sequentially against each instance of the Enum.  This seems like a fairly common use case.

From cProfile it looks like EnumMeta.__getattr__ and _is_dunder are the main bottlenecks:

[...]
      7/1    0.000    0.000    0.000    0.000 abc.py:194(__subclasscheck__)
        1    0.000    0.000    0.001    0.001 enum.py:1(<module>)
        3    0.000    0.000    0.000    0.000 enum.py:132(<genexpr>)
  2000021    0.988    0.000    0.988    0.000 enum.py:16(_is_dunder)
       19    0.000    0.000    0.000    0.000 enum.py:24(_is_sunder)
  2000002    1.825    0.000    2.813    0.000 enum.py:241(__getattr__)
       17    0.000    0.000    0.000    0.000 enum.py:282(__setattr__)
        3    0.000    0.000    0.000    0.000 enum.py:342(_get_mixins_)
        3    0.000    0.000    0.000    0.000 enum.py:387(_find_new_)
[...]
msg236236 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-02-19 22:12
Craig Holmquist wrote:
---------------------
> The consumer of that iteration processes each object with a switch-case-like
> comparison of the category, checking it sequentially against each instance
> of the Enum.

So for every object you compare against every Enum member?  Is there a reason you don't just use the lookup capability?

  class Category(Enum):
    tiny = 1
    medium = 2
    large = 3

  cat = Category(obj.category) # assumes obj.category is 1, 2, or 3
msg236237 - (view) Author: Craig Holmquist (craigh) Date: 2015-02-19 22:19
I may not have been clear before.  What I mean is, code like this:

for obj in get_objects():
    if obj.category == Cat.cat1:
        #do something
    elif obj.category == Cat.cat2:
        #do something else
    elif obj.category == Cat.cat3 or obj.category == Cat.cat4:
        #...

obj.category is already an instance of Cat, in other words.  The consumer is using it to determine what to do with each obj.
msg236239 - (view) Author: Craig Holmquist (craigh) Date: 2015-02-19 22:35
It seems like performance is drastically improved by doing this:

class Category(Enum):
    tiny = 1
    medium = 2
    large = 3

tiny = Category.tiny
medium = Category.medium

In other words, resolving Category.tiny and Category.medium is what's slow.  The comparison itself is very fast.
msg236243 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-02-19 23:41
Yup, you have it figured out.  It's the lookup that is the slowdown.

When performance is an issue one of the standard tricks is to create a local name, like you did with "tiny = Category.tiny".

For the curious (taken from the docstring for Enum.__getattr__):

    We use __getattr__ instead of descriptors or inserting into the enum
    class' __dict__ in order to support `name` and `value` being both
    properties for enum members (which live in the class' __dict__) and
    enum members themselves.

It is possible to store all the enum members /except/ for 'name' and 'value' in the class' __dict__, but I'm not sure it's worth the extra complication.
msg236258 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-02-20 09:29
Out of curiousity I tried:  took two new lines, one modified line, and one comment.  :)
msg236259 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-02-20 09:30
Oh, and the slowdown dropped from 20 to 3 (for non-DynamicClassAttributes -- which is probably more than 99% of them).
msg236260 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-20 09:39
I don't understand the patch, but 3x slower instead of 20x slower is a huge optimization :-) Do you plan to change Python 3.5 *and* Python 3.4?
msg236559 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-02-25 02:28
This isn't a change to the API or any visible user behavior (besides performance), so I don't see a reason to not add it to 3.4.
msg237816 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-03-10 22:06
Larry, I have a very small patch (~4 lines) that does change user behavior or the API, but does have a significant performance boost.

I'm still learning what is/is not okay to add to maintenance releases, so wanted to run this by you.
msg237844 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-03-11 05:35
My inclination is 3.5 only.  Barry, do you want to argue for this going into 3.4?
msg237846 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-03-11 06:17
Argh, sorry -- that was supposed to be *does not* change user behavior nor the API, it's *just* a performance increase.

Does that change your inclination?
msg237847 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-03-11 07:30
Oh, I read the code.  But it's a performance hack, and the rules say we only accept security fixes and bug fixes at this stage of the release, and they're the rules for good reasons.
msg237865 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2015-03-11 13:46
Poor performance could fall under the category of bug fixes, so for an in-maintenance mode release, a fix that does not in any way change user visible behavior could be acceptable.  It would probably be fine for 3.4 but I'm just +0 on it.  Larry's call.
msg237866 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-11 14:04
This is not a regression (there were no enums before 3.4), slow down is not critical (only constant factor, not increased computational complexity), there is a workaround, and the code that just use constants that were converted to IntEnum is not affected. I'm -0 on it.
msg237867 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-03-11 14:27
In getting everything fixed up and tested I realized there was one slight user-facing change:  with this patch it is now possible to say:

  SomeEnum.SomeMember = SomeMember

In other words, it is possible to set a value on the class as long as it is the same value that already existed.

3.5 sounds good.
msg237877 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-03-11 15:43
New changeset 2545bfe0d273 by Ethan Furman in branch 'default':
Close issue23486: performance boost for enum member lookup
https://hg.python.org/cpython/rev/2545bfe0d273
msg237878 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-03-11 15:45
Slight reordering of code removed the one user visible change.
History
Date User Action Args
2015-03-11 15:45:36ethan.furmansetmessages: + msg237878
2015-03-11 15:43:55python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg237877

resolution: fixed
stage: commit review -> resolved
2015-03-11 14:27:05ethan.furmansetmessages: + msg237867
versions: - Python 3.4
2015-03-11 14:04:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg237866
2015-03-11 13:46:44barrysetmessages: + msg237865
2015-03-11 07:30:02larrysetmessages: + msg237847
2015-03-11 06:17:24ethan.furmansetmessages: + msg237846
2015-03-11 05:35:00larrysetmessages: + msg237844
2015-03-10 22:07:02ethan.furmansetmessages: - msg237815
2015-03-10 22:06:10ethan.furmansetnosy: + larry
messages: + msg237816
2015-03-10 22:04:54ethan.furmansetassignee: ethan.furman
messages: + msg237815
stage: commit review
2015-02-25 02:28:29ethan.furmansetnosy: + barry, eli.bendersky
messages: + msg236559
2015-02-20 09:39:08vstinnersetnosy: + vstinner

messages: + msg236260
versions: + Python 3.5
2015-02-20 09:30:14ethan.furmansetmessages: + msg236259
2015-02-20 09:29:04ethan.furmansetfiles: + issue23486.stoneleaf.01.patch
keywords: + patch
messages: + msg236258
2015-02-19 23:41:15ethan.furmansetmessages: + msg236243
title: Enum comparisons are 20x slower than comparing equivalent ints -> Enum member lookup is 20x slower than normal class attribute lookup
2015-02-19 22:35:38craighsetmessages: + msg236239
2015-02-19 22:19:34craighsetmessages: + msg236237
2015-02-19 22:12:02ethan.furmansetmessages: + msg236236
2015-02-19 22:03:28ethan.furmansetnosy: + ethan.furman
2015-02-19 21:59:13craighcreate