msg379874 - (view) |
Author: Fabio Zadrozny (fabioz) * |
Date: 2020-10-29 14:16 |
Right now, when a debugger is active, the number of local variables can affect the tracing speed quite a lot.
For instance, having tracing setup in a program such as the one below takes 4.64 seconds to run, yet, changing all the variables to have the same name -- i.e.: change all assignments to `a = 1` (such that there's only a single variable in the namespace), it takes 1.47 seconds (in my machine)... the higher the number of variables, the slower the tracing becomes.
```
import time
t = time.time()
def call():
a = 1
b = 1
c = 1
d = 1
e = 1
f = 1
def noop(frame, event, arg):
return noop
import sys
sys.settrace(noop)
for i in range(1_000_000):
call()
print('%.2fs' % (time.time() - t,))
```
This happens because `PyFrame_FastToLocalsWithError` and `PyFrame_LocalsToFast` are called inside the `call_trampoline` (https://github.com/python/cpython/blob/master/Python/sysmodule.c#L946).
So, I'd like to simply remove those calls.
Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise mutating non-current frames doesn't work anyways. As a note, pydevd already has such a call: https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b53ab113509d/_pydevd_bundle/pydevd_save_locals.py#L57 and PyPy also has a counterpart.
As for `PyFrame_FastToLocalsWithError`, I don't really see any reason to call it at all.
i.e.: something as the code below prints the `a` variable from the `main()` frame regardless of that and I checked all pydevd tests and nothing seems to be affected (it seems that accessing f_locals already does this: https://github.com/python/cpython/blob/cb9879b948a19c9434316f8ab6aba9c4601a8173/Objects/frameobject.c#L35, so, I don't see much reason to call it at all).
```
def call():
import sys
frame = sys._getframe()
print(frame.f_back.f_locals)
def main():
a = 1
call()
if __name__ == '__main__':
main()
```
So, the idea is removing those lines (I expect that I'll be able to provide a pull request for this).
|
msg385320 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-01-20 09:03 |
Fabio created a thread on python-dev:
https://mail.python.org/archives/list/python-dev@python.org/thread/62WK3THUDNWZCDOMXXDZFG3O4LIOIP4W/
In 2017, Nick Coghlan wrote the PEP 558 "Defined semantics for locals()".
|
msg385323 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-01-20 09:23 |
Mark: do you think that it's an acceptable trade-off to ask debuggers and profilers to call explicitly PyFrame_FastToLocalsWithError() and PyFrame_LocalsToFast(), to optimize the common case, when locals are not needed?
|
msg385329 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2021-01-20 11:01 |
I don't think so.
Tracing is already somewhat fragile, see https://bugs.python.org/issue30744.
Making it more complex is likely to add more bugs.
PEP 558 should help, as f_locals becomes a proxy.
There are some minor issues with PEP 558 (I'm not convinced that it is completely robust), but it should fix this issue.
Ultimately, what we need is a decent debugger API, something along the lines of the JVMTI.
|
msg385336 - (view) |
Author: Fabio Zadrozny (fabioz) * |
Date: 2021-01-20 11:30 |
I agree that it can be made better, but I think most of the issues right now comes from CPython trying to automatically do something that's not bound to work (calling PyFrame_FastToLocals/PyFrame_LocalsToFast under the hood during the tracing call).
https://bugs.python.org/issue30744 is a great example of why that can never work properly (debuggers need to manage that much more carefully, just doing it on all calls is really bound to not work).
So, the current implementation besides being broken also makes things pretty slow.
I agree that PEP 558 may fix things here (but it's much more work).
As a disclaimer, pydevd actually uses a different tracer which doesn't do those calls and when possible uses the frame eval to modify the bytecode directly to minimize the overhead, so, in practice the current support given by CPython for debugger is pretty reasonable for doing a fast debugger (but there are a few caveats that it must work around -- such as the auto PyFrame_FastToLocals/PyFrame_LocalsToFast calls).
|
msg396563 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2021-06-27 07:40 |
I've updated PEP 558 with a reference back to this ticket as additional motivation (the update also addresses several of the fragility concerns that Mark raised): https://github.com/python/peps/pull/1787
There's still work to be done to bring the reference implementation into line with the updated proposal, but I'm making it an explicit goal to get the proposal submitted for pronouncement in time for inclusion in 3.11.
|
msg396572 - (view) |
Author: Fabio Zadrozny (fabioz) * |
Date: 2021-06-27 10:57 |
@ncoghlan I took a quick look at the PEP...
I'm a bit worried about:
> On optimised frames, the Python level f_locals API will become a direct read/write proxy for the frame's local and closure variable storage, and hence no longer support storing additional data that doesn't correspond to a local or closure variable on the underyling frame object.
In particular, it's common for a debugger to evaluate expressions in a context that would change the locals to add new variables.
i.e.: stopping at a breakpoint and doing 2 `exec` calls with frame.f_locals with:
import some_module
v = some_module.compute(xxx)
So, it's expected that `some_module` and `v` would be in the locals at this point.
Right now after each line of the evaluation, a `PyFrame_FastToLocals` must be called so things work as they should, but given that the PEP explicitly says that it should be deprecated, and this being a common feature for a debugger, what'd be the proper way to support that?
p.s.: should I ask such questions regarding the PEP somewhere else instead of this issue or is this an appropriate place?
|
msg396574 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2021-06-27 11:59 |
> So, it's expected that `some_module` and `v` would be in the locals at this point.
If a function does not have the local variables `some_module` and `v`, then the change wouldn't be visible to the debugee.
So what difference does it make?
|
msg396576 - (view) |
Author: Fabio Zadrozny (fabioz) * |
Date: 2021-06-27 12:13 |
> > So, it's expected that `some_module` and `v` would be in the locals at this point.
> If a function does not have the local variables `some_module` and `v`, then the change wouldn't be visible to the debugee.
So what difference does it make?
Right now such changes are visible to the debugee in the locals frames if a user does the `exec` and calls `PyFrame_FastToLocals` right afterwards (even if they weren't initially there).
So, it's the difference between being able to import a module and creating/manipulating new variables in an `exec` in any frame (as it works right now) or not being able to make it at all (if that feature is deprecated as is being implied in the PEP).
|
msg411108 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2022-01-21 11:01 |
While the various frame and debugger PEPs that are open offer a better solution to this, they might not be accepted for 3.11.
So I'd like to revisit this.
Removing the calls to `PyFrame_FastToLocals` and friends cuts the overhead of tracing down a lot. A *lot*.
Using Fabio's example I get the following slowdowns:
No calls to PyFrame_FastToLocals`: x10
Main branch with one local variable: x16
Main branch with six locals variables: x50
This is quite a compelling reason to remove the calls to `PyFrame_FastToLocals`.
The questions is "what will we break doing this"?
All the tests pass, so that seems promising.
Ned, how would this impact coverage.py?
|
msg411111 - (view) |
Author: Ned Batchelder (nedbat) * |
Date: 2022-01-21 11:45 |
Off the top of my head, I'm not sure this affects coverage.py at all, but I could be missing something. Does PR 23028 have all the changes, or is there some other python/cpython branch I can test with?
|
msg411112 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2022-01-21 11:55 |
Yes the PR has all the changes.
It is just the changes sysmodule.c that you need:
https://github.com/python/cpython/pull/23028/files#diff-a3a5c73931235f7f344c072dc755d6508e13923db3f5d581c5e88652075871cb
|
msg411114 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2022-01-21 11:57 |
Or you can use this branch:
https://github.com/faster-cpython/cpython/tree/dont-fast-to-locals-in-trampoline
|
msg411145 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-21 16:40 |
As I already wrote previously, I'm a supporter of requiring debuggers and profilers to explicitly call PyFrame_FastToLocalsWithError() and PyFrame_LocalsToFast(). It should only impact a minority of users, whereas the majority will benefit of way better performance, no?
|
msg411177 - (view) |
Author: Ned Batchelder (nedbat) * |
Date: 2022-01-21 19:55 |
I ran the coverage.py test suite using https://github.com/faster-cpython/cpython/tree/dont-fast-to-locals-in-trampoline, and there were no failures.
|
msg411210 - (view) |
Author: Ned Batchelder (nedbat) * |
Date: 2022-01-21 23:51 |
And btw, the tests with that branch ran at least twice as fast as with 3.10....!
|
msg415801 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-03-22 18:09 |
See also bpo-47092: [C API] Add PyFrame_GetVar(frame, name) function.
|
msg415998 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2022-03-25 12:57 |
New changeset d7163bb35d1ed46bde9affcd4eb267dfd0b703dd by Mark Shannon in branch 'main':
bpo-42197: Don't create `f_locals` dictionary unless we actually need it. (GH-32055)
https://github.com/python/cpython/commit/d7163bb35d1ed46bde9affcd4eb267dfd0b703dd
|
msg416145 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-03-28 06:57 |
Mark: Please add the new PyFrame_GetLocals() function to the C API > New Features doc:
https://docs.python.org/dev/whatsnew/3.11.html#id1
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:37 | admin | set | github: 86363 |
2022-03-28 06:57:42 | vstinner | set | messages:
+ msg416145 |
2022-03-25 12:57:58 | Mark.Shannon | set | messages:
+ msg415998 |
2022-03-22 18:09:13 | vstinner | set | messages:
+ msg415801 |
2022-03-22 17:54:49 | Mark.Shannon | set | pull_requests:
+ pull_request30147 |
2022-01-21 23:51:32 | nedbat | set | messages:
+ msg411210 |
2022-01-21 19:55:07 | nedbat | set | messages:
+ msg411177 |
2022-01-21 16:40:54 | vstinner | set | messages:
+ msg411145 |
2022-01-21 11:57:31 | Mark.Shannon | set | messages:
+ msg411114 |
2022-01-21 11:55:52 | Mark.Shannon | set | messages:
+ msg411112 |
2022-01-21 11:45:21 | nedbat | set | messages:
+ msg411111 |
2022-01-21 11:01:39 | Mark.Shannon | set | nosy:
+ nedbat messages:
+ msg411108
|
2021-06-27 12:13:55 | fabioz | set | messages:
+ msg396576 |
2021-06-27 11:59:27 | Mark.Shannon | set | messages:
+ msg396574 |
2021-06-27 10:57:34 | fabioz | set | messages:
+ msg396572 |
2021-06-27 07:40:13 | ncoghlan | set | messages:
+ msg396563 versions:
+ Python 3.11, - Python 3.10 |
2021-01-20 11:30:22 | fabioz | set | messages:
+ msg385336 |
2021-01-20 11:01:17 | Mark.Shannon | set | messages:
+ msg385329 |
2021-01-20 09:23:26 | vstinner | set | nosy:
+ ncoghlan, Mark.Shannon messages:
+ msg385323
|
2021-01-20 09:03:58 | vstinner | set | messages:
+ msg385320 |
2020-10-29 21:01:29 | vstinner | set | nosy:
+ vstinner
|
2020-10-29 15:41:05 | python-dev | set | keywords:
+ patch nosy:
+ python-dev
pull_requests:
+ pull_request21947 stage: patch review |
2020-10-29 14:16:41 | fabioz | create | |