classification
Title: Is it legal to eliminate tests of a value, when that test has no effect on control flow?
Type: behavior Stage:
Components: Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Mark.Shannon Nosy List: Mark.Shannon, Mohamed_Atef, ethan.furman, gregory.p.smith, gvanrossum, josh.r, pablogsal, serhiy.storchaka, stestagg
Priority: release blocker Keywords:

Created on 2021-01-11 22:26 by stestagg, last changed 2021-01-13 19:18 by gvanrossum.

Messages (25)
msg384867 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-11 22:26
This was raised by Mats Wichmann <mats@python.org>  on the python-dev list.

Commit : c71581c7a4192e6ba9a79eccc583aaadab300efa
bpo-42615: Delete redundant jump instructions that only bypass empty blocks (GH-23733)

appears to have changed the behaviour of the following code:

class B:
     def __bool__(self):
         raise AttributeError("don't do that!")

b = B()
try:
   if b:
        pass
except AttributeError:
   print("HI")


Before the change, the output is:

bool(B)
GOT ERROR
Traceback (most recent call last):
  File "../test.py", line 8, in <module>
    if b:
  File "../test.py", line 4, in __bool__
    raise AttributeError("don't do that!")
AttributeError: don't do that!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../test.py", line 12, in <module>
    raise IndexError("Should GET THIS")
IndexError: Should GET THIS


After the change, just:

SHOULDN'T GET THIS

It seems like the entire branch is being eliminated prematurely, but maybe only when the statement is wrapped in a try-except?
msg384868 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-11 22:35
Apologies, script should have read:
class B:
     def __bool__(self):
         print("bool(B)")
         raise AttributeError("don't do that!")

b = B()
try:
   if b:
        pass
except AttributeError:
   print("GOT ERROR")
   raise IndexError("Should GET THIS")

print("SHOULDN'T GET THIS")


---

The DIS before the change for this code is:
BEFORE:
  1           0 LOAD_BUILD_CLASS
              2 LOAD_CONST               0 (<code object B at 0x7ff45711e710, file "<dis>", line 1>)
              4 LOAD_CONST               1 ('B')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('B')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (B)

  6          14 LOAD_NAME                0 (B)
             16 CALL_FUNCTION            0
             18 STORE_NAME               1 (b)

  7          20 SETUP_FINALLY            8 (to 30)

  8          22 LOAD_NAME                1 (b)
             24 POP_JUMP_IF_FALSE       26

  9     >>   26 POP_BLOCK
             28 JUMP_FORWARD            30 (to 60)

 10     >>   30 DUP_TOP
             32 LOAD_NAME                2 (AttributeError)
             34 JUMP_IF_NOT_EXC_MATCH    58
             36 POP_TOP
             38 POP_TOP
             40 POP_TOP

 11          42 LOAD_NAME                3 (print)
             44 LOAD_CONST               2 ('GOT ERROR')
             46 CALL_FUNCTION            1
             48 POP_TOP

 12          50 LOAD_NAME                4 (IndexError)
             52 LOAD_CONST               3 ('Should GET THIS')
             54 CALL_FUNCTION            1
             56 RAISE_VARARGS            1
        >>   58 RERAISE

 14     >>   60 LOAD_NAME                3 (print)
             62 LOAD_CONST               4 ("SHOULDN'T GET THIS")
             64 CALL_FUNCTION            1
             66 POP_TOP
             68 LOAD_CONST               5 (None)
             70 RETURN_VALUE

Disassembly of <code object B at 0x7ff45711e710, file "<dis>", line 1>:
  1           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('B')
              6 STORE_NAME               2 (__qualname__)

  2           8 LOAD_CONST               1 (<code object __bool__ at 0x7ff45711e660, file "<dis>", line 2>)
             10 LOAD_CONST               2 ('B.__bool__')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               3 (__bool__)
             16 LOAD_CONST               3 (None)
             18 RETURN_VALUE

Disassembly of <code object __bool__ at 0x7ff45711e660, file "<dis>", line 2>:
  3           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('bool(B)')
              4 CALL_FUNCTION            1
              6 POP_TOP

  4           8 LOAD_GLOBAL              1 (AttributeError)
             10 LOAD_CONST               2 ("don't do that!")
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1


----
Afterwards, tehre's a single change:

8 becomes:

8          22 LOAD_NAME                1 (b)
             24 POP_TOP
msg384906 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-12 10:00
I am not sure that it should be fixed.

We already cut corners in similar cases and did this for years, and it always was okay. In the following example bool(a) is only called once:

    if a and b:
        f()

  1           0 LOAD_NAME                0 (a)
              2 POP_JUMP_IF_FALSE       14
              4 LOAD_NAME                1 (b)
              6 POP_JUMP_IF_FALSE       14

  2           8 LOAD_NAME                2 (f)
             10 CALL_FUNCTION            0
             12 POP_TOP
        >>   14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

It differs from the case of using temporary variable (optimization does not work here):

    t = a and b
    if t:
        f()

  1           0 LOAD_NAME                0 (a)
              2 JUMP_IF_FALSE_OR_POP     6
              4 LOAD_NAME                1 (b)
        >>    6 STORE_NAME               2 (t)

  2           8 LOAD_NAME                2 (t)
             10 POP_JUMP_IF_FALSE       18

  3          12 LOAD_NAME                3 (f)
             14 CALL_FUNCTION            0
             16 POP_TOP
        >>   18 LOAD_CONST               0 (None)
             20 RETURN_VALUE

(BTW, Python 3.10 produces less optimal code for that examples)
msg384919 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-12 11:22
The issue here is:

Is it legal to convert
   if x: pass
into
   pass
?

The explicit effect of the code is unchanged, BUT the implicit effect (of calling x.__bool__) is changed.

The examples Serhiy gives are similar. If `bool(a)` evaluates to False, then `bool(a and b)` must be False, so we skip testing `a and b`.
However, we do not know that `bool(a and b)` cannot have a side-effect, so the optimization could be incorrect w.r.t. side effects.
msg384921 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-12 11:39
To be super pedantic, as per my understanding of:

"6.11 ... The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned."

The only corner that was previously cut is that in this statement:

if a and b:
    ...


The evalution should be roughly equivalent to:

bool(a) if bool(a) else bool(b) # <- where bool(b) is never called

instead it's more like:

_x if _x := bool(a) else bool(b) # <- where bool(b) is never called

so, the runtime is eliding a repeated call to bool(a).

This obviously causes problems if bool(a) has per-call side-effects, but this seems to me like a reasonable corner to cut.

Totally eliding the if clause feels to me (subjectively) like a much more risky proposition, and perhaps one that should be documented if kept in?
msg384922 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-12 11:41
I got my and/or logic inverted, but believe the point still stands
msg384924 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-12 12:23
Thank you Steve. Yes, this is what I meant. "if a and b" is so common that we sacrifice literal translation for the sake of performance.

"if" with an empty block looks pretty uncommon to me. It is not worth to optimize this case specially (especially if it changes semantic). What are real world examples for which it makes sense to replace POP_JUMP_IF_FALSE with POP_TOP?
msg384942 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-12 14:37
The question still stands.

Is converting `if x: pass` to `pass` legal?

And, if it is not, is converting 

if a and b:
    body

to

if a:
    if b:
        body

a legal transformation?
(ignoring line numbers)


If the first transformation is not allowed but the second is, why?


B.T.W it is trivial to change the behavior, once we've decided what's correct.
msg384946 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-12 14:46
For the latter, it was decided that it is legal a long time ago. It has a benefit and we did not have any complains for all these years. The absent of this optimization would encourage writing less readable code for performance.

For the former, what is the benefit?
msg384956 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-12 15:47
Can we translate 'if x: pass' into 'pass'? No, because calling its __bool__ method may have a side effect (as we saw at the start of this thread).

Can we eliminate a lone 'x'? Only if it's a local variable and we're *sure* (because of control flow analysis) that it's got a value. For globals and class variables we must execute the load because there could always be an exception (or the dict could have a trap for lookups).

Can we eliminate e.g. 'x.y'? Never, because it can have a side effect.

In general, eliminating this kind of thing seems silly -- in code that the user intends to be fast such things don't occur, and in test the user probably has a reason to write odd code.


On the other question, I don't see how there's any possible difference in evaluation and side effects between

if a and b: ...

and

if a:
    if b:
        ...

so I have no problem with that (in fact that is what it *means*).
msg384958 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-12 16:02
They aren't quite the same. If `a` is falsey, and bool(a) has a side-effect, then that side-effect should occur twice in:

if a and b:
    ...

but only once in
if a:
    if b:
        ...

It gets more interesting (silly), if `a.__bool__()` alternated between True and False.

If we say that such behavior is illegal, and can be ignored by the optimizer, then 3.10 is correct (as it stands).

My example was wrong though, as you've pointed out.
`if x: pass` it transformed to `x`. It is the test that is eliminated, not the evaluation of `x`.
msg384960 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2021-01-12 16:28
If an optimization changes semantics it's not an optimization.

In  `if x: pass` how do we know `x` is falsely without calling `bool()` on it?

---

On a slightly different note, in the code:

    if a and b:
       ...

why is `bool(a)` called twice?
msg384962 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-12 16:36
It's clearer if you rewrite

if a and b:
    ...

as

tmp = a and b
if tmp:
    ...

if a is falsey then bool(a) gets called in `tmp = a and b` and `a` is assigned to `tmp`. Then in `if tmp`, bool(a) is called again.

I agree with you about it not being an optimization if it changes the semantics. But only within agreed semantics.
Optimizations are allow to make fewer calls to __hash__() in a dictionary, or change race conditions, because those are outside of the language specification.
msg384964 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-12 16:39
> How do we know `x` is falsey without calling `bool()` on it?

We don't, but in `if x: pass`, it doesn't matter.
Discounting side-effects in __bool__, the code does nothing regardless of the value of `x`.
msg384968 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-01-12 17:52
If the body of a conditional does nothing, it seems fine to optimize the condition out to me.  But I see code from a low level compiled language perspective where that is clearly what would happen.  In reality, who ever meaningfully writes code where the body of a conditional does nothing?

 * Placeholder code with a # TODO perhaps.  [fine to optimize out]
 * Unit tests attempting to test the behavior of __bool__().  [an annoying behavior change]

Are there others?  Are we expecting this odd "not quite a no-op because we're so high level" pattern to ever appear in a performance critical situation?

The workaround for the latter would be to explicitly `if bool(x):` instead of `if x:` when the body is a no-op.  Or not make the body a no-op.  I expect unittest of __bool__() code owners would be fine with that so long as we call it out clearly in What's New docs, it's just that it could be an annoying change for them to make.

Ideally we'd also provide a lib2to3 fixer to detect and fixup code exhibiting that pattern.

The easiest answer is just not to optimize this out if it isn't actually providing us anything deemed important.
msg384970 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-12 18:09
Hm, I hadn't realized the issue of bool(a) being evaluated once or twice.

The most important side effect that bool(a) can have is raising (as e.g. numpy arrays do), not producing random results. Another important side effect might be loading some value into a cache.

So I think dropping an *extra* call is fine, while dropping the *only* call is not.
msg384994 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2021-01-13 05:07
Gregory: Even in a low-level compiled language (say, C++), pretty sure the compiler can't automatically optimize out:

    if (x) { }

unless it has sure knowledge of the implementation of operator bool; if operator bool's implementation isn't in the header file, and link time optimization isn't involved, it has to call it to ensure any side-effects it might have are invoked.

It can only bypass the call if it knows the implementation of operator bool and can verify it has no observable side-effects (as-if rule). There are exceptions to the as-if rule for optimizations in special cases (copy elision), but I'm pretty sure operator bool isn't one of them; if the optimizer doesn't know the implementation of operator bool, it must call it just in case it does something weird but critical to the program logic.

Point is, I agree that:

if x:
     pass

must evaluate non-constant-literal x for truthiness, no matter how silly that seems (not a huge loss, given very little code should ever actually do that).
msg385014 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-13 10:55
I re-read the change that introduced this, and the situation is slightly more complex.

While the wording is understandably slightly ambiguous, the change talks about the following example:

if a:
  pass
else:
  <do stuff>

In this scenario, the compiler is trying to eliminate the <pass> block, because it's redundant, and this seems totally fine to me.  The condition is still evaluated, but it's changing how the empty branch is handled.

Taking the example given in the ticket:
```
while True:
  if a:
     pass
  else:
     break
```

Before the change, the DIS is:

  2           0 NOP

  3     >>    2 LOAD_NAME                0 (a)
              4 POP_JUMP_IF_FALSE       10

  4           6 JUMP_FORWARD             0 (to 8)

  2     >>    8 JUMP_ABSOLUTE            2

  6     >>   10 LOAD_CONST               1 (None)
             12 RETURN_VALUE

  2          14 LOAD_CONST               1 (None)
             16 RETURN_VALUE

After the change, the DIS is:

  2           0 NOP

  3     >>    2 LOAD_NAME                0 (a)
              4 POP_JUMP_IF_FALSE       10

  4           6 NOP

  2           8 JUMP_ABSOLUTE            2

  6     >>   10 LOAD_CONST               1 (None)
             12 RETURN_VALUE

  2          14 LOAD_CONST               1 (None)
             16 RETURN_VALUE

Point being, the POP_JUMP_IF_FALSE is still invoked in the intended scenario, just some jumps are removed (OmG asserts this is faster).


So, If we test the simple case on MASTER, the following code still does the 'right thing'(tm) by evaluating bool(a):

```
class A:
    def __bool__(self):
        print("BOOL")
        return False

a = A()
if a:
   pass
```
prints "BOOL"

It seems like the issue is encountered when the "if" statement is mixed with (some other control flow scenarios, for example) exception handling:

```
class A:
    def __bool__(self):
        print("BOOL")
        return False

a = A()

try:
  if a:
     pass
except:
    raise
``
prints nothing


So, ignoring the larger discussion about if it's allowed to elide bool calls etc, this feels like an unintended side-effect of an otherwise non-contentious branch optimization, and we should either really understand the situation in-depth (all possible situations where this might happen), or just treat this as a bug and fix ti
msg385038 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-13 15:29
Steve,
Please don't change the title of the issue.

Sure, the optimizer is "inconsistent".
Optimizations are applied in some cases, and not in others.
That's just how compilers work.

The issue here is whether the optimizer is allowed to skip the call to __bool__() if the result doesn't effect the observable effect of running the program.
msg385040 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-13 15:42
Oops, sorry, didn't realise there were such rules.  

The reasoning for me making the change to the title is that that the original PR didn't mention skipping actual condition logic, but does mention skipping unreachable blocks, with the examples provided (either by accident or intent) showing cases where the condition was still included.

I thus assumed the change that has been implemented had bad unintended side-effects (a bug), so wanted to capture that and, if any consensus on allowing the optimizer to skip bool() calls is ever reached on the mailing list, an explicit issue would be raised to cover that change.
msg385045 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-13 15:56
Is anyone still in favor of eliminating the __bool__ call from ‘if p: pass’?
msg385048 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-13 16:47
The problem with using a specific syntax example, is that the optimizer doesn't work that way. It works on the CFG.

Any specification needs to be phrased in terms of general control flow, as other optimizations can enable this transformation.
e.g. 

if x or True:
    do_something()

is (in master) transformed to:

x
do_something()

I think your earlier suggestion of
"So I think dropping an *extra* call is fine, while dropping the *only* call is not."
is the best way to preserve 3.9 behavior.
It can be formalised as:
"The implementation is allowed to skip any boolean tests of a value, when it has effect on the flow of the program and at least one test has already been performed on that value."
msg385049 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-01-13 16:50
I missed a "no" in the above, which somewhat changed the meaning!

It should have read:

"The implementation is allowed to skip any boolean test of a value, when it has *no* effect on the flow of the program and at least one test has already been performed on that value."
msg385050 - (view) Author: Steve Stagg (stestagg) Date: 2021-01-13 17:33
Sounds great to me (with my approximately zero optimizer experience)

At risk of taking this too far, you /could/ add something like:

"skip any boolean test of a value _immediately_ following another boolean test, when it has no ..."

to this spec/guidance/whatever it is.

Just to prevent the risk of the `if` block being removed in future in ridiculous code like the following:

try:
  while True:
    a = x or y
    a.pop()
    if a:
      pass
except XIsEmptyError:
  ...

(I'm guessing we're pretty far from being able to rewrite enough for this to be a remotely credible optimization candidate anytime soon anyway)
msg385051 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-01-13 19:18
> "The implementation is allowed to skip any boolean test of a value, when it has *no* effect on the flow of the program and at least one test has already been performed on that value."

+1
History
Date User Action Args
2021-01-13 19:18:17gvanrossumsetmessages: + msg385051
2021-01-13 17:33:59stestaggsetmessages: + msg385050
2021-01-13 16:50:35Mark.Shannonsetmessages: + msg385049
2021-01-13 16:47:49Mark.Shannonsetmessages: + msg385048
2021-01-13 15:56:28gvanrossumsetmessages: + msg385045
2021-01-13 15:42:59stestaggsetmessages: + msg385040
2021-01-13 15:29:48Mark.Shannonsettitle: Is it legal to eliminate tests of a value, when that test has no effect on control flow -> Is it legal to eliminate tests of a value, when that test has no effect on control flow?
2021-01-13 15:29:21Mark.Shannonsetmessages: + msg385038
title: Inconsistent elimination of empty blocks by optimizer causes __bool__calls to be skipped in some exception handling scenarios -> Is it legal to eliminate tests of a value, when that test has no effect on control flow
2021-01-13 10:55:35stestaggsetmessages: + msg385014
title: Is it legal to eliminate tests of a value, when that test has no effect on control flow? -> Inconsistent elimination of empty blocks by optimizer causes __bool__calls to be skipped in some exception handling scenarios
2021-01-13 05:07:10josh.rsetnosy: + josh.r
messages: + msg384994
2021-01-12 18:09:13gvanrossumsetmessages: + msg384970
2021-01-12 17:52:56gregory.p.smithsetmessages: + msg384968
2021-01-12 16:39:25Mark.Shannonsetmessages: + msg384964
2021-01-12 16:36:23Mark.Shannonsetmessages: + msg384962
2021-01-12 16:28:22ethan.furmansetnosy: + ethan.furman
messages: + msg384960
2021-01-12 16:15:04Mark.Shannonsetpriority: normal -> release blocker
2021-01-12 16:02:15Mark.Shannonsetpriority: release blocker -> normal

messages: + msg384958
2021-01-12 15:47:39gvanrossumsetmessages: + msg384956
2021-01-12 14:46:05serhiy.storchakasetmessages: + msg384946
2021-01-12 14:37:24Mark.Shannonsetmessages: + msg384942
2021-01-12 12:23:48serhiy.storchakasetnosy: + gvanrossum
messages: + msg384924
2021-01-12 11:41:29stestaggsetmessages: + msg384922
2021-01-12 11:39:27stestaggsetmessages: + msg384921
2021-01-12 11:22:31Mark.Shannonsetmessages: + msg384919
title: Regression __bool__ if AttributeError is raised (Possible regression introduced by bpo-42615) -> Is it legal to eliminate tests of a value, when that test has no effect on control flow?
2021-01-12 10:00:46serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg384906
2021-01-12 09:26:04vstinnersettitle: Possible regression introduced by bpo-42615 -> Regression __bool__ if AttributeError is raised (Possible regression introduced by bpo-42615)
2021-01-12 02:19:47pablogsalsetpriority: high -> release blocker
2021-01-12 02:18:49pablogsalsetnosy: + pablogsal
2021-01-12 01:16:17gregory.p.smithsetnosy: + gregory.p.smith
2021-01-12 01:16:11gregory.p.smithsetpriority: normal -> high
2021-01-11 22:47:38Mark.Shannonsetassignee: Mark.Shannon
2021-01-11 22:35:19stestaggsetmessages: + msg384868
2021-01-11 22:26:53stestaggcreate