Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.8.0b2 no longer optimizes away "if 0:" ? #81681

Closed
nedbat opened this issue Jul 5, 2019 · 44 comments
Closed

3.8.0b2 no longer optimizes away "if 0:" ? #81681

nedbat opened this issue Jul 5, 2019 · 44 comments
Labels
3.8 only security fixes 3.9 only security fixes release-blocker

Comments

@nedbat
Copy link
Member

nedbat commented Jul 5, 2019

BPO 37500
Nosy @gvanrossum, @tim-one, @terryjreedy, @nedbat, @ambv, @serhiy-storchaka, @pganssle, @pablogsal, @miss-islington, @tirkarthi, @aldwinaldwin
PRs
  • [3.7] bpo-37500: Revert commit 85ed1712e428f93408f56fc684816f9a85b0ebc0 #14605
  • bpo-37500: Make sure dead code does not generate bytecode but also detect syntax errors #14612
  • [3.8] bpo-37500: Make sure dead code does not generate bytecode but also detect syntax errors (GH-14612) #14780
  • [3.8] bpo-37500: Make sure dead code does not generate bytecode but also detect syntax errors (GH-14612) #15002
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-29.15:17:11.699>
    created_at = <Date 2019-07-05.03:02:39.832>
    labels = ['3.8', '3.9', 'release-blocker']
    title = '3.8.0b2 no longer optimizes away "if 0:" ?'
    updated_at = <Date 2019-07-29.15:17:11.697>
    user = 'https://github.com/nedbat'

    bugs.python.org fields:

    activity = <Date 2019-07-29.15:17:11.697>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-07-29.15:17:11.699>
    closer = 'pablogsal'
    components = []
    creation = <Date 2019-07-05.03:02:39.832>
    creator = 'nedbat'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37500
    keywords = ['patch', '3.8regression']
    message_count = 44.0
    messages = ['347296', '347300', '347302', '347308', '347309', '347342', '347349', '347362', '347368', '347369', '347372', '347375', '347377', '347380', '347381', '347382', '347383', '347385', '347386', '347387', '347388', '347389', '347392', '347393', '347394', '347396', '347397', '347398', '347401', '347402', '347404', '347405', '347406', '347408', '347409', '347410', '347500', '347504', '347511', '347952', '348654', '348659', '348668', '348669']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'tim.peters', 'terry.reedy', 'nedbat', 'lukasz.langa', 'serhiy.storchaka', 'p-ganssle', 'pablogsal', 'miss-islington', 'xtreak', 'aldwinaldwin']
    pr_nums = ['14605', '14612', '14780', '15002']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue37500'
    versions = ['Python 3.8', 'Python 3.9']

    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    CPython 3.8.0b2 behaves differently than previous releases: it no longer optimizes away "if 0:" branches. This is something that has been done since at least 2.4. It was done in 3.8.0b1, but no longer is.

    Was this a purposeful change? Why? It seems like 3.8 is adding more optimizations, why was this removed?

    Test code (/tmp/no0.py):

    import dis
    import sys
    
    print(sys.version)
    
    def with_if_0():
        print(1)
        if 0:
            print(2)
        print(3)
    
    dis.dis(with_if_0)
    $ /usr/local/pythonz/pythons/CPython-3.8.0b2/bin/python3.8 /tmp/no0.py
    3.8.0b2 (default, Jul  4 2019, 22:38:04)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      7           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 (1)
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    8 8 LOAD_CONST 2 (0)
    10 POP_JUMP_IF_FALSE 20

    9 12 LOAD_GLOBAL 0 (print)
    14 LOAD_CONST 3 (2)
    16 CALL_FUNCTION 1
    18 POP_TOP

    10 >> 20 LOAD_GLOBAL 0 (print)
    22 LOAD_CONST 4 (3)
    24 CALL_FUNCTION 1
    26 POP_TOP
    28 LOAD_CONST 0 (None)
    30 RETURN_VALUE

    $ /usr/local/pythonz/pythons/CPython-3.8.0b1/bin/python3.8 /tmp/no0.py
    3.8.0b1 (default, Jun  4 2019, 21:21:14)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      7           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 (1)
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    10 8 LOAD_GLOBAL 0 (print)
    10 LOAD_CONST 4 (3)
    12 CALL_FUNCTION 1
    14 POP_TOP
    16 LOAD_CONST 0 (None)
    18 RETURN_VALUE

    $ python3.7 /tmp/no0.py
    3.7.3 (default, Apr 10 2019, 10:27:53)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      7           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 (1)
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    10 8 LOAD_GLOBAL 0 (print)
    10 LOAD_CONST 2 (3)
    12 CALL_FUNCTION 1
    14 POP_TOP
    16 LOAD_CONST 0 (None)
    18 RETURN_VALUE

    $ python2.7 /tmp/no0.py
    2.7.14 (default, Oct  4 2017, 09:45:53)
    [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
      7           0 LOAD_CONST               1 (1)
                  3 PRINT_ITEM
                  4 PRINT_NEWLINE

    10 5 LOAD_CONST 2 (3)
    8 PRINT_ITEM
    9 PRINT_NEWLINE
    10 LOAD_CONST 0 (None)
    13 RETURN_VALUE

    $ python2.4 /tmp/no0.py
    2.4.6 (#1, Jun 18 2016, 17:49:14)
    [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]
      7           0 LOAD_CONST               1 (1)
                  3 PRINT_ITEM
                  4 PRINT_NEWLINE

    10 5 LOAD_CONST 2 (3)
    8 PRINT_ITEM
    9 PRINT_NEWLINE
    10 LOAD_CONST 0 (None)
    13 RETURN_VALUE

    @nedbat nedbat added the 3.8 only security fixes label Jul 5, 2019
    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    BTW, the same regression applies to "if not __debug__:" . Python3.8.0b1 optimized these away, but b2 does not. That optimization was new in 3.7.0a4

    @tirkarthi
    Copy link
    Member

    Not sure if related there were some changes done to __debug__ related checks with bpo-37269

    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jul 5, 2019

    FWIW: this is probably since PR14099

    @serhiy-storchaka
    Copy link
    Member

    The root cause is PR 13332 for bpo-1875.

    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    Since this was backported to 3.7 but not yet released, I'm adding 3.7regression.

    @ned-deily
    Copy link
    Member

    How serious a regression is this? We still have time to revert the 3.7 commit of bpo-1875 (85ed171) from 3.7.4 final if we act immediately.

    @ned-deily ned-deily added the 3.7 (EOL) end of life label Jul 5, 2019
    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    The real-word implications from my world are this: if your code has "if 0:" clauses in it, and you measure its coverage, then because the lines have not been optimized away, coverage.py will think the lines are a possible execution path, and will be considered a miss because they are not executed. This will reduce your coverage percentage.

    I can't estimate how many people this will affect.

    @pablogsal
    Copy link
    Member

    Ned, I have prepared PR14605 for the 3.7 branch to revert 85ed171. As this is somehow changing behavior in the 3.7 series, I think is the best course of action for 3.7.

    For 3.8 the proposed way to do this is now using an unconditional jump to skip the conditional (in progress in PR 14116). The reason is that we need to compile the blocks to detect syntax errors even if the block will be unreachable. This is also a deep implementation detail and it should not be rely upon, in my opinion.

    @miss-islington
    Copy link
    Contributor

    New changeset 7d93eff by Miss Islington (bot) (Pablo Galindo) in branch '3.7':
    [3.7] bpo-37500: Revert commit 85ed171 (GH-14605)
    7d93eff

    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    I don't know what you mean by, "it should not be rely upon." Are you saying that in 3.8 you want to keep the lines in the compiled bytecode? Do you understand the implications for coverage measurement?

    @scoder
    Copy link
    Contributor

    scoder commented Jul 5, 2019

    it should not be rely upon

    IMHO, the correct behaviour under coverage analysis is to keep the code and not discard it. After all, it is code that exists, and that is not being executed, so it should count as uncovered code. The fact that some Python versions can detect at compile time that it will never be executed, and thus discard it ahead of time, should not have an impact on the coverage analysis.

    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    I can see the logic of the argument that says the code exists, and should be measured. But this is changing behavior that has been in place for at least 15 years.

    Before this change, code could have had an unreported SyntaxError, but it was code that was being discarded by the optimizer anyway. How many people are benefiting from those SyntaxErrors? The only way the code would ever run is if they changed the "if 0:" to something else, at which point they would have seen the SyntaxErrors without this change.

    If we keep this change, I will hear from people unhappy with the drop in their coverage measurement. Have users of the language been complaining that they don't see SyntaxErrors in their optimized-away code?

    @nedbat
    Copy link
    Member Author

    nedbat commented Jul 5, 2019

    To add to the confusion, the treatment of "if __debug__:" and "if not __debug__:" is now asymmetric:

    Here is debug.py:

    import dis
    import sys
    print(sys.version)
    
    def f():
        if __debug__:
            print("debug")
        else:
            print("not debug")
        if not __debug__:
            print("NOT DEBUG")
        else:
            print("DEBUG")
    
    dis.dis(f)
    $ python3.8 /tmp/debug.py
    3.8.0b2 (default, Jul  4 2019, 22:38:04)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      7           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 ('debug')
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    10 8 LOAD_CONST 2 (False)
    10 POP_JUMP_IF_FALSE 22

    11 12 LOAD_GLOBAL 0 (print)
    14 LOAD_CONST 3 ('NOT DEBUG')
    16 CALL_FUNCTION 1
    18 POP_TOP
    20 JUMP_FORWARD 8 (to 30)

    13 >> 22 LOAD_GLOBAL 0 (print)
    24 LOAD_CONST 4 ('DEBUG')
    26 CALL_FUNCTION 1
    28 POP_TOP
    >> 30 LOAD_CONST 0 (None)
    32 RETURN_VALUE

    $ python3.8 -O /tmp/debug.py
    3.8.0b2 (default, Jul  4 2019, 22:38:04)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      6           0 LOAD_CONST               1 (False)
                  2 POP_JUMP_IF_FALSE       14

    7 4 LOAD_GLOBAL 0 (print)
    6 LOAD_CONST 2 ('debug')
    8 CALL_FUNCTION 1
    10 POP_TOP
    12 JUMP_FORWARD 8 (to 22)

    9 >> 14 LOAD_GLOBAL 0 (print)
    16 LOAD_CONST 3 ('not debug')
    18 CALL_FUNCTION 1
    20 POP_TOP

    11 >> 22 LOAD_GLOBAL 0 (print)
    24 LOAD_CONST 4 ('NOT DEBUG')
    26 CALL_FUNCTION 1
    28 POP_TOP
    30 LOAD_CONST 0 (None)
    32 RETURN_VALUE

    In 3.7 (and earlier) the behavior was balanced:

    $ python3.7 /tmp/debug.py
    3.7.3 (default, Apr 10 2019, 10:27:53)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      7           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 ('debug')
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    13 8 LOAD_GLOBAL 0 (print)
    10 LOAD_CONST 2 ('DEBUG')
    12 CALL_FUNCTION 1
    14 POP_TOP
    16 LOAD_CONST 0 (None)
    18 RETURN_VALUE

    $ python3.7 -O /tmp/debug.py
    3.7.3 (default, Apr 10 2019, 10:27:53)
    [Clang 10.0.0 (clang-1000.10.44.4)]
      9           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 ('not debug')
                  4 CALL_FUNCTION            1
                  6 POP_TOP

    11 8 LOAD_GLOBAL 0 (print)
    10 LOAD_CONST 2 ('NOT DEBUG')
    12 CALL_FUNCTION 1
    14 POP_TOP
    16 LOAD_CONST 0 (None)
    18 RETURN_VALUE

    Is this really the desired behavior?

    @pablogsal
    Copy link
    Member

    Before this change, code could have had an unreported SyntaxError, but it was code that was being discarded by the optimizer anyway

    if __debug__ can a valid form of if 0 and therefore any syntax error will not be reported until that branch becomes true.

    How many people are benefiting from those SyntaxErrors?

    I don't know how to give numbers, but is a matter of correctness. SyntaxErrors are reported without the need to execute any code (is a parse error) and should be reported independently of bytecode and what code runs or does not run. Is a property of the code being written, not of the code being executed.

    Have users of the language been complaining that they don't see SyntaxErrors in their optimized-away code?

    Yes, I have seen many people surprised by this. Paul Ganssle (added to the noisy list) was one of the latest ones. Also, is a matter of correctness, is a syntax error and should be reported.

    If we keep this change, I will hear from people unhappy with the drop in their coverage measurement. Have users of the language been complaining that they don't see SyntaxErrors in their optimized-away code?

    I am very sorry that this change affects your users in a negative way. But I think you were relying on an implementation detail of the interpreter that was never assured to have backwards compatibility.

    @pablogsal
    Copy link
    Member

    To add to the confusion, the treatment of "if __debug__:" and "if not __debug__:" is now asymmetric:

    After the code in PR 14116 this becomes:

    5 0 LOAD_GLOBAL 0 (print)
    2 LOAD_CONST 1 ('debug')
    4 CALL_FUNCTION 1
    6 POP_TOP
    8 JUMP_ABSOLUTE 30

    7 10 LOAD_GLOBAL 0 (print)
    12 LOAD_CONST 2 ('not debug')
    14 CALL_FUNCTION 1
    16 POP_TOP

    8 18 JUMP_ABSOLUTE 30

    9 20 LOAD_GLOBAL 0 (print)
    22 LOAD_CONST 3 ('NOT DEBUG')
    24 CALL_FUNCTION 1
    26 POP_TOP
    28 JUMP_FORWARD 8 (to 38)

    11 >> 30 LOAD_GLOBAL 0 (print)
    32 LOAD_CONST 4 ('DEBUG')
    34 CALL_FUNCTION 1
    36 POP_TOP
    >> 38 LOAD_CONST 0 (None)
    40 RETURN_VALUE

    and

    4 0 JUMP_ABSOLUTE 12

    5 2 LOAD_GLOBAL 0 (print)
    4 LOAD_CONST 1 ('debug')
    6 CALL_FUNCTION 1
    8 POP_TOP
    10 JUMP_FORWARD 8 (to 20)

    7 >> 12 LOAD_GLOBAL 0 (print)
    14 LOAD_CONST 2 ('not debug')
    16 CALL_FUNCTION 1
    18 POP_TOP

    9 >> 20 LOAD_GLOBAL 0 (print)
    22 LOAD_CONST 3 ('NOT DEBUG')
    24 CALL_FUNCTION 1
    26 POP_TOP
    28 JUMP_FORWARD 8 (to 38)

    11 30 LOAD_GLOBAL 0 (print)
    32 LOAD_CONST 4 ('DEBUG')
    34 CALL_FUNCTION 1
    36 POP_TOP
    >> 38 LOAD_CONST 0 (None)

    @tim-one
    Copy link
    Member

    tim-one commented Jul 5, 2019

    I hate this change :-( The code generated for something like this today:

    def f():
        if 0:
            x = 1
        elif 0:
            x = 2
        elif 1:
            x = 3
        elif 0:
            x = 4
        else:
            x = 5
        print(x)

    is the same as for:

    def f():
        x = 3
        print(x)

    No tests or jumps at all. That made the optimization an extremely efficient, and convenient, way to write code with the _possibility_ of using different algorithms by merely flipping a 0 and 1 or two in the source code, with no runtime costs at all (no cycles consumed, no bytecode bloat, no useless unreferenced co_consts members, ...). Also a zero-runtime-cost way to effectively comment out code blocks (e.g., I've often put expensive debug checking under an "if 1:" block).

    @pablogsal
    Copy link
    Member

    No tests or jumps at all. That made the optim...

    I understand that, but we would need a way of reporting syntax errors in blocks even if they will not be executed. Although I agree with your analysis, correctness should be prioritized over optimization, IMHO.

    @pablogsal
    Copy link
    Member

    CPython being the reference implementation of Python, people could interpret that this optimization is part of the language and its behavior should be mirror in every other implementation. That's why under my understanding, correctness should always be prioritized.

    Tim, do you have any other way that can remove the bytecode but reporting SyntaxErrors for these blocks?

    @markrwilliams
    Copy link
    Mannequin

    markrwilliams mannequin commented Jul 5, 2019

    As a user of Python who maintains at least one large code base, I rely on coverage to limit the damage my changes can inflict. Some of the code I maintain uses __debug__; I would not expect moving to 3.8 to be the cause of reduced line coverage in that project, and I would be baffled by the difference between a coverage report from 3.8 and < 3.8.

    Please don't break a package as fundamental as coverage. Can the compiler be changed to not emit bytecode under some circumstances?

    @pablogsal
    Copy link
    Member

    @tim-one
    Copy link
    Member

    tim-one commented Jul 5, 2019

    There's "correctness" that matters and "correctness" that's merely pedantic ;-) CPython has acted the current way for about 15 years (since 2.4 was released), and this is the first time anyone has raised an objection. That's just not enough harm to justify tossing out well over a decade of "facts on the ground". Practicality beats purity.

    I, of course, have no objection to detecting syntax errors in dead code. It's the disruptive _way_ that's being done that's objectionable. I don't have a better way in mind, nor the bandwidth to try to dream one up. But, at heart, I'd rather it never "get fixed" than get fixed at this cost.

    @pablogsal
    Copy link
    Member

    CPython has acted the current way for about 15 years (since 2.4 was released), and this is the first time anyone has raised an objection.

    Although I tend to agree with your words, I have to insist that correctness in the reference implementation is very important, is not just a "pragmatic" thing. For example, under this argument, we could say that it does not matter if

    def f():
      if 0:
        yield

    should be or not a generator. But that changes things massively. It happens that it will always be a generator due to how we check for that property, but under this argument is ok if it is undefined.

    This also makes the thing even more confusing as there are some syntax consequences of unreachable code, but no others. (Like the bytecode for that yield will be gone!!. Is impossible to know if that is a generator or not from the bytecode).

    I have seen many people confused already and it makes very difficult to guess what are the consequences of code that is unreachable at runtime. For these reasons I think consistency is key.

    I agree that the way this is done is not ideal, but we could not find a better way to do this :(

    @terryjreedy
    Copy link
    Member

    I agree with Tim. Detect SyntaxErrors in dead code if you can, but don't change the current code deletion, at least not without proper discussion (which Pablo just started on pydev). The latter is what I consider to be a release blocker needing release manager consideration.

    I see optimizing away 'if __debug__: ...' clauses, when __debug__ is False (and 0) as currently documented language behavior, not a 'deep implementation detail'.
    https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement
    Assert statements are equivalent to 'if __debug__' clauses. They are optimized away when __debug__ is False, when python is run with -0.

    Given that False and True are 0 and 1 with extra behavior, it is not a stretch to infer that 'if 0:' clauses are optimized away, as they currently are (or were, until PR 13332).

    To my mind, PR 13332 is either an enhancement or a bugfix + additional regressive behavior change that I do not believe is absolutely necessary. In either case, I don't think it should have been applied as is in b2. But that is past. I think it should be either reverted or fixed before b3 to only detect SyntaxErrors without leaving code that should be removed. I don't know how close PR 14116 comes to this.

    @pganssle
    Copy link
    Member

    pganssle commented Jul 5, 2019

    CPython has acted the current way for about 15 years (since 2.4 was released), and this is the first time anyone has raised an objection.

    This is the expected result of fixing a bug that has been open since 2008 (11 years), which itself was noticed independently, also in 2008 (bpo-1875 and bpo-1920, respectively). I also independently discovered the same issue last year when writing some tests for IPython, and I suspect others have hit it as well and shrugged it off because it's not that hard to work around and there was an open issue for it.

    I will also note that this "not raising a SyntaxError" behavior is a CPython-specific implementation detail (and is quite fragile to boot), note:

        $ cat syntax_err.py
        if 0:
            return
    
        $ python syntax_err.py   # Works on 2.7 and <3.8
        $ pypy syntax_err.py     # Same behavior with pypy3.6
          File "syntax_err.py", line 2
            return
            ^
        SyntaxError: return outside function
        
    I am not sure if pypy emits the same bytecode, but I think that

    Please don't break a package as fundamental as coverage.

    I think it's a bit melodramatic to consider this "breaking" coverage, particularly since I would find it very weird if any code under an "if 0" were simply ignored by coverage, and I would probably report it as a bug. It changes the behavior, sure, but this was code that is legitimately not covered. I don't think it's a big problem to add # pragma: nocover to code blocks you don't want covered.

    @ned-deily
    Copy link
    Member

    For the record, this change in behavior was originally backported to the 3.7 branch and appeared in 3.7.4rc1 and rc2. Now that the compatibility issues are clearer, it will be reverted for 3.7.4 final and will not be considered for future 3.7.x releases.

    @ned-deily ned-deily added the 3.9 only security fixes label Jul 5, 2019
    @ned-deily ned-deily removed the 3.7 (EOL) end of life label Jul 5, 2019
    @pablogsal
    Copy link
    Member

    In either case, I don't think it should have been applied as is in b2

    Just to clarify: this was added before beta2.

    @pablogsal
    Copy link
    Member

    I see optimizing away 'if __debug__: ...' clauses, when __debug__ is False (and 0) as currently documented language behavior, not a 'deep implementation detail'.
    https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement

    I don't think you can conclude that is documented with this argument. What's documented is what happens with the assert statement, not with arbitrary dead code. The link between the two is at least an interpretation, not an explicitly documented think.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 5, 2019

    This is the expected result of fixing a bug that has been
    open since 2008

    It's the expected result of fixing a bug _by_ eliminating the optimization entirely. It's not an expected result of merely fixing the bug. It's quite obviously _possible_ to note that

    if 0:
        return

    is forbidden at module level, yet not emit any bytecode for it. Indeed, it's so obvious that everyone here did it in their head without even noticing they had done so ;-)

    @pablogsal
    Copy link
    Member

    Tim, you made very good points in you email to Python-dev. I am convinced that the better thing to do at this pointis to revert this until we can think this a bit more :)

    I have made a PR to revert the change, could you review it, please?

    @pablogsal
    Copy link
    Member

    It's the expected result of fixing a bug _by_ eliminating the optimization entirely.

    Using jumps is not removing the optimization entirely, is just a weaker and more incomplete way of doing the same. The optimization introduced the bug in the first place. If this was done today it would have been reverted. The only thing legitimating the "error" is time. But this is basically elevating a bug to a feature, making this even more relevant than ever:

    https://xkcd.com/1172/

    @pablogsal
    Copy link
    Member

    I think I have found a simple way to make everyone happy: we can add a new field to the compiler that tells it to not emit bytecode. In this way we can detect errors by walking the blocks but no bytecode will be emitted.

    I have updated PR 14612 to use this approach.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 6, 2019

    Using jumps is not removing the optimization
    entirely, is just a weaker and more incomplete
    way of doing the same.

    Sorry, I'm afraid I have no idea what that means. The generated code before and after was wildly different, as shown in Ned's original report here. In what possible sense was his "if 0:" being "optimized" if it generated code to load 0 onto the stack, then POP_JUMP_IF_FALSE, and then jumped over all the code generated for the dead block?

    The generated code after is nearly identical if I replace his "if 0:" with "if x:" (which the compiler takes to mean a global or builtin about whose truthiness it knows nothing at all). Indeed, the only difference in the byte code is that instead of doing a LOAD_CONST to load 0, it does a LOAD_GLOBAL to load x.

    So, to my eyes, absolutely nothing of the optimization remained. At least not in the example Ned posted here.

    @pablogsal
    Copy link
    Member

    So, to my eyes, absolutely nothing of the optimization remained. At least not in the example Ned posted here.

    This is because his example did not include the changes in PR14116 that implemented that.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 6, 2019

    we could say that it does not matter if

    def f():
    if 0:
    yield

    should be or not a generator

    Slippery slope arguments play better if they're made _before_ a decade has passed after the slope was fully greased.

    There's nothing accidental about how yield behaves here. I wrote the original generator PEP, and very deliberately added these to its doctests (in test_generators.py):

    """
    >>> def f():
    ...    if 0:
    ...        yield
    >>> type(f())
    <class 'generator'>
    
    
    >>> def f():
    ...     if 0:
    ...         yield 1
    >>> type(f())
    <class 'generator'>
    
    >>> def f():
    ...    if "":
    ...        yield None
    >>> type(f())
    <class 'generator'>
    """

    Any alternate implementation that decided that whether "it's a generator" depended on optimizations would likely fail at least one of those tests. It was intended to be solely a compile-time decision, based purely on syntactic analysis.

    So I've seen no reason to believe - or expect - that the damage here goes - or will ever go - deeper than that some dead code isn't raising compile-time errors in some rare cases (statements allowed only at function level being used in dead code outside function level).

    Which should be fixed, if possible. But, as damage goes - sorry! - it just seems minimal to me.

    @pablogsal
    Copy link
    Member

    Which should be fixed, if possible. But, as damage goes - sorry! - it just seems minimal to me.

    I think I failed to express myself correctly there :( I have absolutely no problem with the way that behaves currently, I just wanted to point out the possible differences and view points regarding optimization and how "implementation details" percolate to observable behavior.

    @gvanrossum
    Copy link
    Member

    Sure seems to me as if we need to have a separate pass that looks for out-of-place `return`, `yield`, `break` and `continue`, run before the optimization removes `if 0:` blocks etc.

    @pablogsal
    Copy link
    Member

    Yep, that is more or less the approach in PR14612. What I tried to achieve there is also minimize the amount of changes to make the pass for errors non intrusive.

    @ned-deily
    Copy link
    Member

    New changeset 4834c80 by Ned Deily (Pablo Galindo) in branch '3.7':
    [3.7] bpo-37500: Revert commit 85ed171 (GH-14605)
    4834c80

    New changeset 87a918a by Ned Deily in branch '3.7':
    [3.7] bpo-37500: update Misc/NEWS entries to mention revert
    87a918a

    @miss-islington
    Copy link
    Contributor

    New changeset 18c5f9d by Miss Islington (bot) (Pablo Galindo) in branch 'master':
    bpo-37500: Make sure dead code does not generate bytecode but also detect syntax errors (GH-14612)
    18c5f9d

    @ambv
    Copy link
    Contributor

    ambv commented Jul 29, 2019

    Victor closed the 3.8 backport, stating on GitHub:

    "I closed the 3.8 backport (without merging it), until we agree on what should be done."

    This is marked as release blocker. I will be releasing 3.8b3 as is, please decide what to do here before b4, I will block the last beta on this issue.

    @serhiy-storchaka
    Copy link
    Member

    I am fine with backporting the 3.9 solution to 3.8. Sorry for the delay.

    @miss-islington
    Copy link
    Contributor

    New changeset 9ea738e by Miss Islington (bot) in branch '3.8':
    bpo-37500: Make sure dead code does not generate bytecode but also detect syntax errors (GH-14612)
    9ea738e

    @pablogsal
    Copy link
    Member

    I will close this for now, we can revisit this if we find a better solution for the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes release-blocker
    Projects
    None yet
    Development

    No branches or pull requests