classification
Title: Generator `yield`s counted as primitive calls by cProfile
Type: behavior Stage:
Components: C API Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: matthew.suozzo
Priority: normal Keywords:

Created on 2020-12-03 04:29 by matthew.suozzo, last changed 2020-12-14 21:52 by matthew.suozzo.

Messages (2)
msg382373 - (view) Author: Matthew Suozzo (matthew.suozzo) * Date: 2020-12-03 04:29
# Issue

When profiling a generator function, the initial call and all subsequent yields are aggregated into the same "ncalls" metric by cProfile.

## Example

>>> cProfile.run("""
... def foo():
...   yield 1
...   yield 2
... assert tuple(foo()) == (1, 2)
... """)
   <snip>
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      ...
        3    0.000    0.000    0.000    0.000 <string>:2(foo)
      ...

This was unexpected behavior since it *looks* like a single call from the code. This also complicates basic analysis about the frequency of a codepath's execution where a generator might yield a variable number of times depending on the input.

The profile interface can and does differentiate between call types: Normal calls and "primitive" calls, i.e. those that are not induced via recursion, are displayed as "all_calls/primitive_calls" e.g. "3/1" for a single initial calls with 2 recursed calls (comparable to the example above).

This seems like an appropriate abstraction to apply in the generator case: Each yield is better modeled as an 'interior' call to the generator, not as a call on its own.

# Possible fix

I have two ideas that seem like they might address the problem:

* Add a new PyTrace_YIELD constant (and handle it in [3])
* Track some state from the `frame->f_gen` in the ProfilerEntry (injected in [3], used in [4]) to determine whether this is the first or a subsequent call.

I've not been poking around for long so I don't have an intuition about which would be less invasive (nor really whether either would work in practice).

As background, this seems to be the call chain from trace invocation to callcount increment:
[0]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Python/ceval.c#L4106-L4107
[1]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Python/ceval.c#L4937
[2]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Python/ceval.c#L4961
[3]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Modules/_lsprof.c#L419
[4]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Modules/_lsprof.c#L389
[5]: https://github.com/python/cpython/blob/4e7a69bdb63a104587759d7784124492dcdd496e/Modules/_lsprof.c#L311-L316
msg383007 - (view) Author: Matthew Suozzo (matthew.suozzo) * Date: 2020-12-14 21:52
One of the problems with my proposed solution that I glossed over was how and where to count the primitive call. If the primitive call is only registered on RETURN (i.e. after all yields), a generator that is incompletely exhausted would have 0 primitive calls. However if the primitive call is registered immediately when the call context is entered, the recorded call would be instantaneous (i.e. 0s duration) which could throw off the percall statistics.

Even with the incomplete generator case, I think registering the primitive call on RETURN is the best of the two options. Although seeing 2/0 for the call count when RETURN wasn't reached might seem odd, it lines up with a conceptual model of a coroutine: An iterator represents a function returning a sequence; Exhausting that iterator corresponds to the function call completing; Incompletely exhausting corresponds to an abbreviated/incomplete function call.
History
Date User Action Args
2020-12-14 21:52:21matthew.suozzosetmessages: + msg383007
2020-12-03 04:29:39matthew.suozzocreate