Message 383586 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nedbat
Recipients	Mark.Shannon, nedbat, rhettinger, serhiy.storchaka
Date	2020-12-22.12:21:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1608639709.41.0.162468277729.issue42693@roundup.psfhosted.org>
In-reply-to

Content
Mark said: > An optimization (CS not math) is a change to the program such that it has the same effect, according to the language spec, but improves some aspect of the behavior, such as run time or memory use. > > Any transformation that changes the effect of the program is not an optimization. > > You shouldn't be able to tell, without timing the program (or measuring memory use, observing race conditions, etc.) whether optimizations are turned on or not. It's not that simple. Many aspects of the program can be observed, and coverage.py observes them and reports on them. Coverage.py reports on branch coverage by tracking pairs of line numbers: in the trace function, the last line number is remembered, then paired with the current line number to note how execution moved from line to line. This is an observable behavior of the program. The optimization of removing jumps to jumps changes this observable behavior. Here is a bug report against coverage.py that relates to this: https://github.com/nedbat/coveragepy/issues/1025 To reproduce this in the small, here is bug1025.py: nums = [1,2,3,4,5,6,7,8] # line 1 for num in nums: # line 2 if num % 2 == 0: # line 3 if num % 4 == 0: # line 4 print(num) # line 5 continue # line 6 print(-num) # line 7 Here is branch_trace.py: import sys pairs = set() last = -1 def trace(frame, event, arg): global last if event == "line": this = frame.f_lineno pairs.add((last, this)) last = this return trace code = open(sys.argv[1]).read() sys.settrace(trace) exec(code) print(sorted(pairs)) Running "python branch_trace.py bug1025.py" produces: -1 -3 4 -5 -7 8 [(-1, 1), (1, 2), (2, 3), (3, 4), (3, 7), (4, 2), (4, 5), (5, 6), (6, 2), (7, 2)] Conceptually, executing bug1025.py should sometimes jump from line 4 to line 6. When line 4 is false, execution moves to the continue and then to the top of the for loop. But CPython optimizes away the jump to a jump, so the pair (4, 6) never appears in our trace output. The result is that coverage.py thinks there is a branch that could have occurred, but was never observed during the run. It reports this as a missed branch. Coverage.py currently deals with these sorts of issues by understanding the kinds of optimizations that can occur, and taking them into account when figuring "what could have happened during execution". Currently, it does not understand the jump-to-jump optimizations, which is why bug 1025 happens. This pairing of line numbers doesn't relate specifically to the "if 0:" optimizations that this issue is about, but this is where the observability point was raised, so I thought I would discuss it here. As I said earlier, this probably should be worked out in a better forum. This is already long, so I'm not sure what else to say. Optimizations complicate things for tools that want to analyze code and help people reason about code. You can't simply say, "optimizations should not be observable." They are observable.

Mark said:

> An optimization (CS not math) is a change to the program such that it has the same effect, according to the language spec, but improves some aspect of the behavior, such as run time or memory use.
> 
> Any transformation that changes the effect of the program is not an optimization.
> 
> You shouldn't be able to tell, without timing the program (or measuring memory use, observing race conditions, etc.) whether optimizations are turned on or not.

It's not that simple. Many aspects of the program can be observed, and coverage.py observes them and reports on them.

Coverage.py reports on branch coverage by tracking pairs of line numbers: in the trace function, the last line number is remembered, then paired with the current line number to note how execution moved from line to line.  This is an observable behavior of the program.  The optimization of removing jumps to jumps changes this observable behavior.

Here is a bug report against coverage.py that relates to this: https://github.com/nedbat/coveragepy/issues/1025

To reproduce this in the small, here is bug1025.py:

    nums = [1,2,3,4,5,6,7,8]        # line 1
    for num in nums:                # line 2
        if num % 2 == 0:            # line 3
            if num % 4 == 0:        # line 4
                print(num)          # line 5
            continue                # line 6
        print(-num)                 # line 7

Here is branch_trace.py:

    import sys

    pairs = set()
    last = -1

    def trace(frame, event, arg):
        global last
        if event == "line":
            this = frame.f_lineno
            pairs.add((last, this))
            last = this
        return trace

    code = open(sys.argv[1]).read()
    sys.settrace(trace)
    exec(code)
    print(sorted(pairs))

Running "python branch_trace.py bug1025.py" produces:

    -1
    -3
    4
    -5
    -7
    8
    [(-1, 1), (1, 2), (2, 3), (3, 4), (3, 7), (4, 2), (4, 5), (5, 6), (6, 2), (7, 2)]

Conceptually, executing bug1025.py should sometimes jump from line 4 to line 6. When line 4 is false, execution moves to the continue and then to the top of the for loop.  But CPython optimizes away the jump to a jump, so the pair (4, 6) never appears in our trace output.  The result is that coverage.py thinks there is a branch that could have occurred, but was never observed during the run.  It reports this as a missed branch.

Coverage.py currently deals with these sorts of issues by understanding the kinds of optimizations that can occur, and taking them into account when figuring "what could have happened during execution". Currently, it does not understand the jump-to-jump optimizations, which is why bug 1025 happens.

This pairing of line numbers doesn't relate specifically to the  "if 0:" optimizations that this issue is about, but this is where the observability point was raised, so I thought I would discuss it here.  As I said earlier, this probably should be worked out in a better forum.

This is already long, so I'm not sure what else to say.  Optimizations complicate things for tools that want to analyze code and help people reason about code.  You can't simply say, "optimizations should not be observable."  They are observable.

History
Date	User	Action	Args
2020-12-22 12:21:49	nedbat	set	recipients: + nedbat, rhettinger, Mark.Shannon, serhiy.storchaka
2020-12-22 12:21:49	nedbat	set	messageid: <1608639709.41.0.162468277729.issue42693@roundup.psfhosted.org>
2020-12-22 12:21:49	nedbat	link	issue42693 messages
2020-12-22 12:21:48	nedbat	create