This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Using multiple comparison operators can cause performance issues
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, akuvfx, brandtbucher, serhiy.storchaka, steven.daprano, tim.peters
Priority: normal Keywords: patch

Created on 2021-10-20 18:25 by akuvfx, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 29109 open akuvfx, 2021-10-20 18:29
Messages (7)
msg404508 - (view) Author: Maja (akuvfx) * Date: 2021-10-20 18:25
For example:

    def f(x):
        return 1 < x < 3

will be slower than:

    def f(x):
        return 1 < x and x < 3

The first function will generate following bytecode:

              0 LOAD_CONST               1 (1)
              2 LOAD_FAST                0 (x)
              4 DUP_TOP
              6 ROT_THREE
              8 COMPARE_OP               0 (<)
             10 JUMP_IF_FALSE_OR_POP    18
             12 LOAD_CONST               2 (3)
             14 COMPARE_OP               0 (<)
             16 RETURN_VALUE
        >>   18 ROT_TWO
             20 POP_TOP
             22 RETURN_VALUE

Performs unnecessary stack operations: duplicates x, rotates 3 items for no reason, then re-rotates 2 items and pops a value. This is fine if the value in the middle was more complex and needed to be duplicated rather than recalculated, which would be definitely slower, but for simpler values like names or constants, it's actually bad. The bytecode for the first function should look the same as second one which is:

              0 LOAD_CONST               1 (1)
              2 LOAD_FAST                0 (x)
              4 COMPARE_OP               0 (<)
              6 JUMP_IF_TRUE_OR_POP     14
              8 LOAD_FAST                0 (x)
             10 LOAD_CONST               2 (3)
             12 COMPARE_OP               0 (<)
        >>   14 RETURN_VALUE
msg404536 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2021-10-20 23:02
The PR changes behavior slightly:

def f():
    class A:
        def __lt__(self, other):
            nonlocal x
            x += 100
            return True
    a = A()
    x = 1
    print(a < x < 10)
    x = 1
    print(a < x and x < 10)

### Before ###
>>> f()
True
False

### After ###
>>> f()
False
False


So strictly speaking, this would be backwards-incompatible. But morally, I am not totally sure.
msg404553 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2021-10-21 00:17
I think Dennis's example is fatal: from section 6.10 ("Comparisons"):

"""
Comparisons can be chained arbitrarily, e.g., `x < y <= z` is equivalent to `x < y and y <= z`, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).
"""

So doing LOAD_FAST twice on x (in `1 < x < 3`) is prohibited by the language definition. Doesn't matter to this whether it's plain `x` or `f(x)` where `f()` is some arbitrary function: the object the middle comparand signifies is fixed at whatever it was when the the first comparison is evaluated.
msg404576 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-21 09:42
Agree with Tim.

The idea of optimizing stack manipulation operations for constants is interesting, but is it common enough case to justify the compiler complication?

See also rejected issue27236.
msg416706 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2022-04-04 23:14
https://bugs.python.org/issue47221 was opened as a duplicate of this.

Unless there are any new ideas for getting around the concerns here, I think this can be closed.
msg416719 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2022-04-05 02:35
I came here from #47221.

If I am reading this correctly, it concerns me that stack operations (which should be fast) are apparently slow?

If we can't reduce the number of stack operations, can we speed them up?
msg416735 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2022-04-05 03:14
For reference, chaining is about 1.18x slower in this microbenchmark on GCC:

./python -m pyperf timeit -s "x = 100" "if 10 < x < 30: print('no')" --duplicate=10
.....................
Mean +- std dev: 21.3 ns +- 0.2 ns
./python -m pyperf timeit -s "x = 100" "if 10 < x and x < 30: print('no')" --duplicate=10
.....................
Mean +- std dev: 18.0 ns +- 0.5 ns

For a related case, in GH-30970, the bytecode generate by "a, b = a0, b0" was changed.
   Before: [load_a0, load_b0, swap, store_a, store_b]
   After:  [load_a0, load_b0, store_b, store_a]
However, this was only changed when the stores were STORE_FASTs. STORE_GLOBAL/STORE_NAME/STORE_DEREF cases still have the SWAP.
In the STORE_GLOBAL cases, you can construct scenarios with custom __del__ methods where storing b and then a has different behavior than storing a and then b. No such cases can be constructed for STORE_FAST without resorting to frame hacking.

I wonder if the same argument applies here: maybe @akuvfx's PR could be altered to use LOAD_FAST twice for each variable *only* if everything in sight is the result of a LOAD_FAST or a LOAD_CONST. My example above uses a LOAD_DEREF, so its behavior could remain unchanged.

The argument that this would within the language spec is maybe a little bit more dubious than the "a, b = a0, b0" case though, since custom `__lt__` methods are a bit more well-specified than custom `__del__` methods.

Thoughts?
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89705
2022-04-07 18:29:37brandtbuchersetnosy: + brandtbucher
2022-04-05 03:14:58Dennis Sweeneysetmessages: + msg416735
2022-04-05 02:35:06steven.dapranosetstatus: pending -> open
nosy: + steven.daprano
messages: + msg416719

2022-04-04 23:14:14Dennis Sweeneysetstatus: open -> pending

messages: + msg416706
2022-04-04 23:10:33Dennis Sweeneylinkissue47221 superseder
2021-10-21 09:42:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg404576
2021-10-21 00:17:40tim.peterssetnosy: + tim.peters
messages: + msg404553
2021-10-20 23:02:09Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg404536
2021-10-20 18:29:39akuvfxsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27377
2021-10-20 18:25:33akuvfxcreate