Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local variable assignment is broken when combined with threads + tracing + closures #74929

Open
njsmith opened this issue Jun 24, 2017 · 68 comments
Assignees
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@njsmith
Copy link
Contributor

njsmith commented Jun 24, 2017

BPO 30744
Nosy @arigo, @ncoghlan, @abalkin, @benjaminp, @njsmith, @xdegaye, @markshannon, @1st1, @xgid
PRs
  • [PEP 558 - WIP] bpo-30744: Trace hooks no longer reset closure state #3640
  • bpo-35134: Migrate frameobject.h contents to cpython/frameobject.h #18052
  • Dependencies
  • bpo-17960: Clarify the required behaviour of locals()
  • Files
  • thread-closure-bug-demo.py
  • x.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = None
    created_at = <Date 2017-06-24.04:05:28.903>
    labels = ['interpreter-core', '3.10']
    title = 'Local variable assignment is broken when combined with threads + tracing + closures'
    updated_at = <Date 2021-01-29.16:12:19.866>
    user = 'https://github.com/njsmith'

    bugs.python.org fields:

    activity = <Date 2021-01-29.16:12:19.866>
    actor = 'terry.reedy'
    assignee = 'ncoghlan'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2017-06-24.04:05:28.903>
    creator = 'njs'
    dependencies = ['17960']
    files = ['46971', '46972']
    hgrepos = []
    issue_num = 30744
    keywords = ['patch']
    message_count = 48.0
    messages = ['296751', '296756', '296757', '296758', '296770', '296806', '296808', '296810', '296841', '296882', '297145', '297150', '297254', '297259', '297480', '297503', '297504', '301769', '301786', '301856', '302420', '302475', '302505', '304099', '304297', '304307', '304315', '304317', '304319', '304323', '304325', '304329', '304345', '304371', '304377', '304381', '304388', '304398', '304399', '304414', '304415', '304416', '304419', '304730', '304745', '304753', '304754', '305586']
    nosy_count = 10.0
    nosy_names = ['arigo', 'ncoghlan', 'belopolsky', 'benjamin.peterson', 'ionelmc', 'njs', 'xdegaye', 'Mark.Shannon', 'yselivanov', 'xgdomingo']
    pr_nums = ['3640', '18052']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue30744'
    versions = ['Python 3.10']

    Linked PRs

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 24, 2017

    The attached script looks innocent, but gives wildly incorrect results on all versions of CPython I've tested.

    It does two things:

    • spawns a thread which just loops, doing nothing

    • in the main thread, repeatedly increments a variable 'x'

    And most of the increments of the variable are lost!

    This requires two key things I haven't mentioned, but that you wouldn't expect to change anything. First, the thread function closes over the local variable 'x' (though it doesn't touch this variable in any way). And second, the thread function has a Python-level trace function registered (but this trace function is a no-op).

    Here's what's going on:

    When you use sys.settrace() to install a Python-level trace function, it installs the C-level trace function "call_trampoline". And then whenever a trace event happens, call_trampoline calls PyFrame_FastToLocalsWithError, then calls the Python-level trace function, then calls PyFrame_LocalsToFast (see: https://github.com/python/cpython/blob/master/Python/sysmodule.c#L384-L395). This save/call/restore sequence is safe if all the variables being saved/restored are only visible to the current thread, which used to be true back in the days when local variables were really local. But it turns out nowadays (since, uh... python 2.1), local variables can be closed over, and thus can visible from multiple threads simultaneously.

    Which means we get the following sequence of events:

    • In thread A, a trace event occurs (e.g., executing a line of code)

    • In thread A, call_trampoline does PyFrame_FastToLocalsWithError, which saves a snapshot of the current value of 'x'

    • In thread A, call_trampoline then starts calling the trace function

    • In thread B, we increment 'x'

    • In thread A, the trace function returns

    • In thread A, call_trampoline then does PyFrame_LocalsToFast, which restores the old value of 'x', overwriting thread B's modifications

    This means that merely installing a Python-level trace function – for example, by running with 'python -m trace' or under pdb – can cause otherwise correct code to give wrong answers, in an incredibly obscure fashion.

    In real life, I originally ran into this in a substantially more complicated situation involving PyPy and coverage.py and a nonlocal variable that was protected by a mutex, which you can read about here: https://bitbucket.org/pypy/pypy/issues/2591/
    It turns out that since this only affects *Python*-level trace functions, merely recording coverage information using coverage.py doesn't normally trigger it, since on CPython coverage.py tries to install an accelerator module that uses a *C*-level trace function. Which is lucky for my particular case. But probably it should be fixed anyway :-).

    CC'ing belopolsky as the interest area for the trace module, Mark Shannon because he seems to enjoy really pathological low-level bugs, and Benjamin and Yury as the "bytecode" interest area. I'm surprised that there's apparently no interest area for, like, "variables should retain the values they are assigned" -- apologies if I've CC'ed anyone incorrectly.

    @njsmith njsmith added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 24, 2017
    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Jun 24, 2017

    A version of the same problem without threads, using generators instead to get the bug deterministically. Prints 1, 1, 1, 1 on CPython and 1, 2, 3, 3 on PyPy; in both cases we would rather expect 1, 2, 3, 4.

    @arigo
    Copy link
    Mannequin

    arigo mannequin commented Jun 24, 2017

    (Note: x.py is for Python 2.7; for 3.x, of course, replace .next() with .__next__(). The result is the same)

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 24, 2017

    Some thoughts based on discussion with Armin in #pypy:

    It turns out if you simply delete the LocalsToFast and FastToLocals calls in call_trampoline, then the test suite still passes. I'm pretty sure that pdb relies on this as a way to set local variables, though, so I think this is probably more of a bug in the test suite than anything else.

    In some ways, it seems like the most attractive fix to this would be to have the Python-level locals() and frame.f_locals return a dict proxy object whose __setitem__ writes any mutations back to both the fast array and the real frame->f_locals object, so that LocalsToFast becomes unnecessary. As Armin points out, though, that would definitely be a semantic change: there may be code out there that relies on locals() returning a dict – technically it doesn't have to return a dict even now, but it almost always does – or that assumes it can mutate the locals() array without that affecting the function. I think this would arguably be a *good* thing; it would make the behavior of locals() and f_locals much less confusing in general. But he's right that it's something we can only consider for 3.7.

    The only more-compatible version I can think of is: before calling the trace function, do FastToLocals, and also make a "shadow copy" of all the cellvars and freevars. Then after the trace function returns, when copying back from LocalsToFast, check against these shadow copies, and only write back cellvars/freevars that the trace function actually modified (where by modified we mean 'old is not new', just a pointer comparison). Since most functions have no cellvars or freevars, and since we would only need to do this when there's a Python-level trace function active, this shouldn't add much overhead. And it should be compatible enough to backport even to 2.7, I think.

    @ncoghlan
    Copy link
    Contributor

    The writeback-only-if-changed approach sounds like a plausible improvement to me. I'd be hesitant to include such a change in 3.5.4 though, since we don't get a second go at that if something breaks unexpectedly.

    However, I don't think returning a write-through proxy from locals() would be a good idea, since you can still have race conditions between "read-update-writeback" operations that affect the cells directly, as well as with those that use the new write-through proxy. At the moment, you can only get yourself into trouble by way of operations that have access to LocalsToFast, and those are pretty rare now that exec is no longer a statement.

    If folks really want that write-through behaviour (at least for closure variables), 3.7 will let them write it themselves, since closure cells are becoming writeable from pure Python code:

    >>> def outer():
    ...     x = 1
    ...     def inner():
    ...         return x
    ...     return inner
    ... 
    >>> f = outer()
    >>> f.__closure__[0].cell_contents
    1
    >>> f.__closure__[0].cell_contents = 2
    >>> f()
    2
    >>> f.__closure__[0].cell_contents = "hello"
    >>> f()
    'hello'

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 25, 2017

    It isn't obvious to me whether the write-through proxy idea is a good one on net, but here's the rationale for why it might be.

    Currently, the user-visible semantics of locals() and f_locals are a bit complicated. AFAIK they aren't documented anywhere outside the CPython and PyPy source (PyPy is careful to match all these details), but let me take a stab at it:

    The mapping object you get from locals()/f_locals represents the relevant frame's local namespace. For many frames (e.g. module level, class level, anything at the REPL), it literally *is* the local namespace: changes made by executing bytecode are immediately represented in it, and changes made to it are immediately visible to executing bytecode. Except, for function frames, it acts more like a snapshot copy of the local namespace: it shows you the namespace at the moment you call locals(), but then future changes to either the code's namespace or the object don't affect each other. Except except, the snapshot might be automatically updated later to incorporate namespace changes, e.g. if I do 'ns = locals()' and then later on someone accesses the frame's f_locals attribute, then reading that attribute will cause my 'ns' object to be silently updated. But it's still a snapshot; modifications to the mapping aren't visible to the executing frame. Except**3, if you happen to modify the mapping object while you're inside a trace function callback, then *those* modifications are visible to the executing frame. (And also if a function is being traced then as a side-effect this means that now our 'ns' object above does stay constantly up to date.) Except**4, you don't actually have to be inside a trace function callback for your modifications to be visible to the executing frame – all that's necessary is that *some* thread somewhere is currently inside a trace callback (even if it doesn't modify or even look at the locals itself, as e.g. coverage.py doesn't).

    This causes a lot of confusion [1].

    On top of that, we have this bug here. The writeback-only-if-changed idea would make it so that we at least correctly implement the semantics I described in the long paragraph above. But I wonder if maybe we should consider this an opportunity to fix the underlying problem, which is that allowing skew between locals() and the actual execution namespace is this ongoing factory for bugs and despair. Specifically, I'm wondering if we could make the semantics be:

    "locals() and f_locals return a dict-like object representing the local namespace of the given frame. Modifying this object and modifying the corresponding local variables are equivalent operations in all cases."

    (So I guess this would mean a proxy object that on reads checks the fast array first and then falls back to the dict, and on writes updates the fast array as well as the dict.)

    you can still have race conditions between "read-update-writeback" operations that affect the cells directly, as well as with those that use the new write-through proxy.

    Sure, but that's just a standard concurrent access problem, no different from any other case where you have two different threads trying to mutate the same local variable or dictionary key at the same time. Everyone who uses threads knows that if you want to do that then you need a mutex, and if you don't use proper locking then it's widely understood how to recognize and debug the resulting failure modes. OTOH, the current situation where modifications to the locals object sometimes affect the namespace, and sometimes not, and sometimes they get overwritten, and sometimes they don't, and it sometimes depends on spooky unrelated things like "is some other thread currently being traced"? That's *way* more confusing that figuring out that there might be a race condition between 'x = 1' and 'locals()["x"] = 2'.

    Plus, pdb depends on write-through working, and there are lots of frames that don't even use the fast array and already have my proposed semantics. So realistically our choices are either "consistently write-through" or "inconsistently write-through".

    [1] https://www.google.com/search?q=python+modify+locals&ie=utf-8&oe=utf-8

    @ncoghlan
    Copy link
    Contributor

    To make the behaviour more consistent in 3.7, I'd be more inclined to go in the other direction: make locals() return a truly independent snapshot when used in a function, rather than sharing a single snapshot between all locals() calls.

    Shared snapshots that may potentially be written back to the frame locals would then be unique to trace functions, rather than being a feature of function level locals() calls in general.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 25, 2017

    Interesting idea! I'm not sure I fully understand how it would work though.

    What would you do for the frames that don't use the fast array, and where locals() currently returns the "real" namespace?

    How are you imagining that the trace function writeback would be implemented? Some sort of thread-local flag saying "we're inside a trace function for frame XX" that causes locals() and f_locals to switch to returning a "real" namespace object?

    @ncoghlan
    Copy link
    Contributor

    Sorry, I wasn't clear: I don't see any problem for the cases that don't optimize local variable access, and don't think any of those should change.

    Instead, I think we should tighten up the formal specification of locals() to better match how it is actually used in practice: https://mail.python.org/pipermail/python-dev/2013-May/125917.html

    (https://bugs.python.org/issue17960 is the corresponding issue, although I clearly got distracted by other things and never followed up with a patch for the language reference. https://bugs.python.org/issue17546 is another issue lamenting the current underspecification in this area)

    However, function bodiess are already inherently different from other execution namespaces, and that stems from a particular special case assumption that we don't make anywhere else: we assume that at compile time, the compiler can see all of the names added to the local namespace of a function.

    That assumption wasn't quite valid in Python 2 (since unqualified exec statements and function level wildcard imports could mess with it), but it's much closer to being true in Python 3.

    Checking the 3.7 code, the only remaining ways to trigger it are:

    • via a tracing function (since LocalsToFast gets called after the tracing function runs)
    • by injecting an IMPORT_STAR opcode into a function code object (the compiler disallows that in Python 3 and emits a SyntaxWarning for it in Python 2, but the LocalsToFast call is still there in the eval loop)

    So I think an entirely valid way forward here would be to delete LocalsToFast in 3.7+, and say that if you want to write access to a function namespace from outside the function, you need to either implement an eval hook (not just a tracing hook), or else use a closure that closes over all the variables that you want write access to.

    However, the less drastic way forward would be to make it so that writing a tracing function is the only way to get access to the FastToLocals result, and have locals() on a frame running a code object compiled for fast locals return f->f_locals.copy() rather than a direct reference to the original.

    @ncoghlan
    Copy link
    Contributor

    I updated the old "we should clarify the semantics" issue with a more concrete update proposal: https://bugs.python.org/issue17960#msg296880

    Essentially nothing would change for module and class scopes, but the proposal for function scopes is that locals() be changed to return "frame.f_locals.copy()" rather than a direct reference to the original.

    Nothing would change for tracing functions (since they already access frame.f_locals directly), but the current odd side-effect that setting a trace function has on the result of normal locals() calls would go away.

    Folks that actually *wanted* the old behaviour would then need to do either "sys._getframe().f_locals" or "inspect.currentframe().f_locals".

    However, none of that would help with *this* issue: resolving the bug here would still require either a modification that allowed PyFrame_LocalsToFast to only write back those values that had been rebound since the preceding call to PyFrame_FastToLocals (ma_version_tag could help with doing that efficiently), or else a decision to disallow write-backs to frame locals even from tracing functions in 3.7+.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 28, 2017

    Folks that actually *wanted* the old behaviour would then need to do either "sys._getframe().f_locals" or "inspect.currentframe().f_locals".

    So by making locals() and f_locals have different semantics, we'd be adding yet another user-visible special-case? That seems unfortunate to me.

    if you want to write access to a function namespace from outside the function, you need to either implement an eval hook (not just a tracing hook)
    [...]
    or else a decision to disallow write-backs to frame locals even from tracing functions in 3.7+.

    Disallowing writeback from tracing functions would completely break bdb/pdb, so unless you're planning to rewrite bdb in C as an eval hook, then I don't think this is going to happen :-). Writing back to locals is a useful and important feature!

    I think I'm missing some rationale here for why you prefer this approach – it seems much more complicated in terms of user-visible semantics, and possibly implementation-wise as well.

    @ncoghlan
    Copy link
    Contributor

    I like it because it categorically eliminates the "tracing or not?" global state dependence when it comes to manipulation of the return value of locals() - manipulating that will either always affect the original execution namespace immediately (modules, classes, exec, eval), or always be a completely independent snapshot that can never lead to changes in the original name bindings (functions, generators, coroutines).

    As a result, the locals() documentation updates for issue bpo-17960 wouldn't need to mention trace functions at all.

    Instead, the only folks that would need to worry about potentially unexpected updates to the internal state of functions, generators, and coroutines when tracing is in effect would be those accessing frame.f_locals directly. That state dependence can then be documented specifically as part of the f_locals documentation, and users of that attribute can make their own copy if they want to ensure that their subsequent mutations definitely can't affect the original namespace, even when tracing is in effect.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Jun 29, 2017

    I like it because it categorically eliminates the "tracing or not?" global state dependence when it comes to manipulation of the return value of locals() - manipulating that will either always affect the original execution namespace immediately (modules, classes, exec, eval), or always be a completely independent snapshot that can never lead to changes in the original name bindings (functions, generators, coroutines).

    Maybe I was unclear...? my question is why you prefer locals-making-a-copy over locals-and-f_locals-for-function-frames-return-proxy-objects. It seems to me that both of these proposals eliminate the "tracing or not?" global state dependence (right?), so this can't be a reason to prefer one over the other. And the latter additionally eliminates the distinction between modules/classes/exec/eval and functions/generators/coroutines, while also avoiding introducing a distinction between locals() and f_locals.

    @ncoghlan
    Copy link
    Contributor

    The problem I see with proxy objects for functions/coroutines/generators is that it *doesn't* match how locals() currently behaves in the absence of a tracing function - that gives you a "single shared snapshot" behaviour, where writes to the result of locals() *don't* affect the original namespace.

    I agree that replacing frame.f_locals with a write-through proxy would be a good way to get rid of PyFrame_LocalsToFast, though (and thus fix the bug this issue covers).

    The point where we disagree is that I think we should replace the tracing-or-not distinction with a locals()-or-frame.f_locals distinction, not get rid of the distinction entirely.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Jul 1, 2017

    In msg296758 Nathaniel wrote:

    It turns out if you simply delete the LocalsToFast and FastToLocals calls in call_trampoline, then the test suite still passes. I'm pretty sure that pdb relies on this as a way to set local variables, though, so I think this is probably more of a bug in the test suite than anything else.

    Maybe a point to be taken in this issue is that pdb does a poor usage of this feature as shown in bpo-9633: changing the value of a local only works in limited cases.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 2, 2017

    As per the discussion in issue bpo-17960, I'm going to put together a short PEP about this topic for Python 3.7 that will better specify the expected behaviour of locals() and frame.f_locals.

    That will be a combination of making the status quo official, proposing some changes, and asking Guido for a design decision.

    Documenting the status quo: make the current behaviour at module scope, class scope, and the corresponding behaviours of exec and eval officially part of the status quo

    Proposing a change: replacing the current LocalsToFast approach with a custom write-through proxy type that updates the locals array and nonlocal cell variables immediately whenever frame.f_locals is modified (this is the part that should fix the bug reported in this issue)

    Requesting a design decision: whether locals() at function(/generator/coroutine) scope should return:

    • a direct reference to the write-through proxy
    • a live read-only dict-proxy (ala class __dict__ attributes)
    • a point-in-time snapshot (i.e. copying the namespace)

    @ncoghlan ncoghlan self-assigned this Jul 2, 2017
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jul 2, 2017

    Err, s/officially part of the status quo/officially part of the language specification/ :)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 9, 2017

    After drafting PEP-558, briefly chatting about it to Guido, and sleeping on the topic, I'm wondering if there might be answer that's simpler than any of the alternatives consider so far: what if PyFrame_FastToLocals added the *cell objects* to f_locals for any variables that the compiler would access using LOAD/STORE_DEREF (rather than LOAD/STORE_FAST), and then PyFrame_LocalsToFast only wrote back the entries for variables that used LOAD/STORE_FAST?

    That would be enough to fix the reported problems (since PyFrame_LocalsToFast would always leave closure variables alone in both the function defining the closure and the ones referencing it), and a trace hook that actually *wanted* to update the closure references could write to the "cell_contents" attribute on the cell of interest.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 9, 2017

    How does a trace function or debugger tell the difference between a closure cell and a fast local whose value happens to be a cell object?

    @ncoghlan
    Copy link
    Contributor

    The same way the dis module does: by looking at the names listed in the various code object attributes.

    If it's listed in co_cellvars, then it's a local variable in the current frame that's in a cell because it's part of the closure for a nested function.

    If it's listed in co_freevars, then it's a nonlocal closure reference.

    Otherwise, it's a regular local variable that just happens to be holding a reference to a cell object.

    So if all we did was to put the cell objects in the frame.f_locals dict, then trace functions that supported setting attributes (including pdb) would need to be updated to be cell aware:

        def setlocal(frame, name, value):
            if name in frame.f_code.co_cellvars or name in frame.f_code.co_freevars:
                frame.f_locals[name].cell_contents = value
            else:
                frame.f_locals[name] = value

    However, to make this more backwards compatible, we could also make it so that *if* a cell entry was replaced with a different object, then PyFrame_LocalsToFast would write that replacement object back into the cell.

    Even with this more constrained change to the semantics frame.f_locals at function level, we'd probably still want to keep the old locals() semantics for the builtin itself - that has lots of string formatting and other use cases where having cell objects suddenly start turning up as values would be surprising.

    @ncoghlan
    Copy link
    Contributor

    https://github.com/python/cpython/pull/3640/files includes a more automated-tested-friendly version of Armin's deterministic reproducer (the generator loop uses range() to generator a separate loop counter, and then the test checks that the incremented closure variable matches that loop counter), together with a fix based on the idea of putting actual cell objects into frame.f_locals while a trace hook is running. It turned out a lot of the necessary machinery was already there, since CPython already uses a common pair of utility functions (map_to_dict/dict_to_map) to handle fast locals, cell variables, and nonlocal variables.

    PEP-558 still needs another editing pass before I send it to python-dev, and the PR needs some additional test cases to explicitly cover the expected locals() and frame.f_locals semantics at different scopes when a Python-level trace hook is installed, but I'm happy now that this is a reasonable and minimalistic way to resolve the original problem.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Sep 18, 2017

    Doesn't this proposal break every debugger, including pdb?

    @ncoghlan
    Copy link
    Contributor

    For write-backs: no, since the interpreter will still write those values back into the destination cell

    For locals display: no, since nothing changes for the handling of fast locals

    For closure display: yes as, by default, debuggers will now print the closure cell, not the value the cell references - they'd need to be updated to display obj.cell_contents for items listed in co_freevars and co_cellvars.

    That's why PEP-588 needs to be a PEP - there's a language design question around the trade-off between requiring all future Python implementations to implement a new write-through proxy type solely for this use case, or instead requiring trace functions to cope with co_freevars and co_cellvars being passed through as cell objects, rather than as direct references to their values.

    @ncoghlan
    Copy link
    Contributor

    I've been thinking further about the write-through proxy idea, and I think I've come up with a design for one that shouldn't be too hard to implement, while also avoiding all of the problems that we want to avoid.

    The core of the idea is that the proxy type would just be a wrapper around two dictionaries:

    • the existing f_locals dictionary
    • a new dictionary mapping cell & free variable names to their respective cells (note: this may not actually need to be a dict, as a direct reference from the proxy back to the frame may also suffice. However, I find it easier to think about the design by assuming this will be a lazily initialised dict in its own right)

    Most operations on the proxy would just be passed through to f_locals, but for keys in both dictionaries, set and delete operations would *also* affect the cell in the cell dictionary. (Fortunately dict views don't expose any mutation methods, or intercepting all changes to the mapping would be a lot trickier)

    Frames would gain a new lazily initialised "f_traceproxy" field that defaults to NULL/None.

    For code objects that don't define or reference any cells, nothing would change relative to today.

    For code objects that *do* define or reference cells though, tracing would change as follows:

    • before calling the trace function:
      • f_locals would be updated from the fast locals array and current cell values as usual
      • f_locals on the frame would be swapped out for f_traceproxy (creating the latter if needed)
    • after returning from the trace function:
      • f_locals on the frame would be reset back to bypassing the proxy (so writes to f_locals stop being written through to cells when the trace hook isn't running)
      • only the actual locals would be written from f_locals back to the fast locals array (cell updates are assumed to have already been handled via the proxy)

    This preserves all current behaviour *except* the unwanted one of resetting cells back to their pre-tracehook value after returning from a trace hook:

    • code outside trace hooks can't mutate the function level fast locals or cells via locals() or frame.f_locals (since their modifications will be overwritten immediately before the trace function runs), but *can* treat it as a regular namespace otherwise
    • code inside trace hooks can mutate function level fast locals and cells just by modifying frame.f_locals
    • all code can display the current value of function level fast locals and cells just by displaying locals() or frame.f_locals
    • there's still only one f_locals dictionary per frame, it may just have a proxy object intercepting writes to cell variables when a trace hook is running

    That way, we can avoid the problem with overwriting cells back to old values, *without* enabling arbitrary writes to function locals from outside trace functions, and without introducing any tricky new state synchronisation problems.

    @gvanrossum
    Copy link
    Member

    Wow, nothing here is simple. :-( Even though the examples show that there's obviously a bug, I would caution against fixing it rashly without understanding how the current behavior may be useful in other circumstances. I've lost what I recall from reading PEP-558 so I can't quite fathom the new design, but I wish you good luck trying to implement it, and hopefully it will work out as hoped for (simpler, solving the issue, keeping the useful behaviors).

    @ncoghlan
    Copy link
    Contributor

    Aye, what's in PEP-558 is the least invasive implementation change I've been able to come up that fixes the reported bug, but Nathaniel's right that it would break debuggers (and probably some other things) as currently written.

    So the above design comes from asking myself "How can I get the *effect* of PEP-558, while hiding what's going on internally even from trace function implementations?".

    While I can't be sure until I've actually implemented it successfully (no ETA on that, unfortunately), I think blending my idea with Nathaniel's original idea will let us enjoy the benefits of both approaches, without the downsides of either of them.

    @njsmith
    Copy link
    Contributor Author

    njsmith commented Oct 13, 2017

    I guess I should say that I'm still confused about why we're coming up with such elaborate schemes here, instead of declaring that f_locals and locals() shall return a dict proxy so that from the user's point of view, they Always Just Work The Way Everyone Expects.

    The arguments against that proposal I'm aware of are:

    1. Implementing a full dict-like mapping object in C is tiresome. But your proposal above also requires doing this, so presumably that's not an issue.

    2. We want to keep the current super-complicated and confusing locals() semantics, because we like making life difficult for alternative implementations (PyPy at least exactly copies all the weird details of how CPython's locals() works, which is why it inherited this bug), and by making the language more confusing we can encourage the use of linters and boost Python's Stackoverflow stats. ...I guess my bias against this argument is showing :-). But seriously, if we want to discourage writing to locals() then the way to do that is to formally deprecate it, not go out of our way to make it silently unreliable.

    3. According to the language spec, all Python implementations have to support locals(), but only some of them have to support frame introspection, f_locals, debugging, and mutation of locals. But... I think this is a place where the language spec is out of touch with reality. I did a quick survey and AFAICT in practice, Python implementations either support *both* locals() and f_locals (CPython, PyPy, Jython, IronPython), or else they support *neither* locals() nor f_locals (MicroPython -- in fact MicroPython defines locals() to unconditionally return an empty dict). We could certainly document that supporting writes through locals() is a quality-of-implementation thing CPython provides, similar to the prompt destruction guarantees provided by refcounting. But I don't think implementing this is much of a burden -- if you have enough introspection metadata to get the list of locals and figure out where their values are stored in memory (which is the absolute minimum to implement locals()), then you probably also have enough metadata to write back to those same locations. Plus debugger support is obviously a priority for any serious full-fledged implementation.

    So the original write-through proxy idea still seems like the best solution to me.

    @ncoghlan
    Copy link
    Contributor

    When I rejected that approach previously, it hadn't occurred to me yet that the write-through proxy could write through to both the frame state *and* the regular dynamic snapshot returned by locals().

    The current design came from asking myself "What if the proxied reads always came from the snapshot, just like they do now, but writes went to *both* places?".

    So far I haven't found any fundamental problems with the approach, but I also haven't implemented it yet - I've only read through all the places in the code where I think I'm going to have to make changes in order to get it to work.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 5, 2017

    Starting to make some progress on an implementation, and it occurs to me that if this approach does work out, it should make Python level trace functions *much* faster.

    Right now, the call to the Python function in call_trampoline is bracketed by PyFrame_FastToLocals() and PyFrame_LocalsToFast(), even if the trace function never accesses frame.f_locals.

    By contrast, with the proposed design, PyFrame_LocalsToFast() never gets called anywhere (I've actually replaced the entire body with a PyErr_SetString call), and PyFrame_FastTo_Locals will only be called in the frame.f_locals descriptor implementation.

    @terryjreedy terryjreedy added 3.10 only security fixes and removed 3.7 (EOL) end of life labels Jan 29, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @mattip
    Copy link
    Contributor

    mattip commented Dec 23, 2022

    This is tagged 3.10. It still fails on python3.10 but passes on 3.11 when using @arigo's minimal reproducer

    test_issue74929.py
    import sys
    
    def outer():
        x = 0
    
        def traced_looper():
            # Force w_locals to be instantiated (only matters on PyPy; on CPython
            # you can comment this line out and everything stays the same)
            locals()
            # Force x to be closed over (could use 'nonlocal' on py3)
            if False:
                x
    
        yield traced_looper
        while True:
            x += 1
            yield "value of x is: %d" % (x,)
    
    outer_generator = outer()
    traced_looper = next(outer_generator)
    
    
    def tracefunc(frame, event, arg):
        print(next(outer_generator))
        return tracefunc
    
    sys.settrace(tracefunc)
    traced_looper()
    sys.settrace(None)
    
    

    which prints this for python3.10:

    $ python3.10 /tmp/test_issue74929.py 
    value of x is: 1
    value of x is: 1
    value of x is: 1
    value of x is: 1
    
    

    and the correct values for python3.11

    $ python3.11 /tmp/test_issue74929.py 
    value of x is: 1
    value of x is: 2
    value of x is: 3
    value of x is: 4
    

    @markshannon
    Copy link
    Member

    The "fix" in 3.11 more by accident than design. I suspect that you could get the bug back by using sys.get_frame(1).f_locals in the tracer function.

    I don't know if or when we will fix this properly.
    My hope is that PEP 669 will kill off uses of sys.settrace and the problem will mostly go away.

    @gvanrossum
    Copy link
    Member

    Wouldn’t this also be fixed by different semantics for locals() per either Nick’s or Mark’s PEP for that?

    @iritkatriel
    Copy link
    Member

    Since 3.10 is in security-fix-only mode now, should this be closed?

    @iritkatriel iritkatriel added the pending The issue will be closed if no feedback is provided label Apr 5, 2023
    @njsmith
    Copy link
    Contributor Author

    njsmith commented Apr 6, 2023 via email

    @albertz
    Copy link
    Contributor

    albertz commented Feb 1, 2024

    Not combined with threads or trace functions or closures, here is another such issue about locals (as discussed in #113939):

    When an exception is caught in an except block, and then you access frame.f_locals, it will make a copy of the fast locals into the f_locals dict. When the except block goes out of scope, it would delete the exception object (e.g. exc), i.e. DELETE_FAST on exc. However, when f_locals was accessed, this would still keep a reference to exc alive, until the whole frame exits, or until f_locals is accessed next time again, when it resyncs the dict with the fast locals. This can lead to very unexpected behavior. In my case, the exc object had also stored the exception stacktrace with all frames, and there was lots of memory occupied by those frames (it was some PyTorch neural network training, all the activation tensors where there in locals of those frames). In the except block, I did some extended error reporting, and that involved to go through the frames and also access their f_locals. After the reporting, I tried to recover the training in some cases, e.g. for OOM errors. However, even after the exception goes out of scope, even after doing del exc explicitly in the code, it would not free the memory. Very unexpectedly, it frees the memory once I get the next exception (the reason: at that point, I access f_locals of the current frame again, and then the old exception is finally freed).

    Here is a test case which demonstrates this problem:

    class LocalsTest(unittest.TestCase):
        """
        Tests for locals.
        """
    
        def test_locals_cleared_after_exception_handled(self):
            # see gh-113939
            class C:
                pass
            wr = None
            def inner():
                nonlocal wr
                c = C()
                wr = weakref.ref(c)
                1/0
            try:
                inner()
            except ZeroDivisionError as exc:
                support.gc_collect()
                self.assertIsNotNone(wr())
                print(exc.__traceback__.tb_frame.f_locals)
            support.gc_collect()
            self.assertIsNone(wr())

    #113939 was additionally also about bad behavior of frame.clear(), which is fixed now (via #113940). But this problem as described here still persists. @iritkatriel suggested to add this example to the discussion here.

    I think PEP-667 or PEP-558 would solve this issue.

    @gvanrossum
    Copy link
    Member

    Yes, maybe PEP 667 needs a new champion so we can get this fixed in 3.13. It would help tremendously if there was an implementation. Can someone help?

    @gaogaotiantian
    Copy link
    Member

    Yes, maybe PEP 667 needs a new champion so we can get this fixed in 3.13. It would help tremendously if there was an implementation. Can someone help?

    I can take a try if no one is looking at this. pdb has issues blocked on this (#102865 for example). Even with PEP 669 and sys.settrace going away (which is a huge if), the inconsistency of frame.f_locals and the fast locals will still be an issue as the debuggers still need to deal with the local variables. I believe making it consistent will solve a lot of corner issues.

    However, I never worked on a PEP before. Normally an implementation comes up before the PEP is accepted? I should be able to make it before May so it can arrive at 3.13.

    @gvanrossum
    Copy link
    Member

    Not every PEP has a full implementation before the PEP is sent to the Steering Council, but it often helps to be able to show that the proposed solution actually works. In this case, things are even more complicated; there's a competing PEP 558, which already has a working implementation. It may need some work if it were to be merged into main, but it can be used to show the behavior relative to a somewhat recent version of Python. We don't have an implementation for PEP 667. In my experience, attempting an implementation may also find issues with the proposal.

    I do want to emphasize that given the draft status of both PEPs, the implementation may never be merged. You might want to try to come up with a prototype implementation that is enough to show the PEP can be made to work, and put off finishing it (or optimizing it, or whatever it needs) until it has been accepted.

    @gaogaotiantian
    Copy link
    Member

    Okay, I got it. I'll try an implementation for this and see if I can make it work.

    @gvanrossum
    Copy link
    Member

    Excellent. Don't hesitate to come back here and ask questions or link to a draft PR!

    @gaogaotiantian
    Copy link
    Member

    gaogaotiantian commented Feb 8, 2024

    Hi @gvanrossum , I finished the first draft of the implementation of PEP 667 in #115153 . The code is far from being ready to be merged yet, but all the essential parts are implemented so it should be enough as a prototype.

    Good news - all tests passed except for 3 that are expected to fail.

    Basically a new real-time proxy object is implemented for the local variables on the frame. There might be corner cases that I did not cover, but at least for all the test cases we have, it worked well.

    FastToLocals and LocalsToFast are not removed at this point - I expect some tests fail due to the removal, but not a lot. I can try to remove it, which might require some corresponding stdlib change to work.

    Let me know how to proceed next. Is the current prototype enough for the SC to decide whether to accept the PEP, or we need more evidence.

    @gvanrossum
    Copy link
    Member

    Thanks Tian, quick work! I have put it in my review queue -- unfortunately it's a long queue. :-( In the meantime, @markshannon (PEP 667's author) might also be interested.

    @gvanrossum
    Copy link
    Member

    With Mark and Carl reviewing I hope you don't need my feedback. Process-wise, I think we now are ready for a discussion on Discourse about the relative merits of PEP 667 and PEP 558, and then it's off to the Steering Council. I hope that the discussion leads to a clear favorite, so the SC doesn't have to break ties.

    @gaogaotiantian
    Copy link
    Member

    Yes we can have a discussion. I've read PEP 558 as well and I think PEP 667 at least provides a better interface to debuggers. But of course we should hear as many opinions from all the developers as possible.

    @gvanrossum
    Copy link
    Member

    Traditionally such a discussion is started by the PEP author. But I suspect @markshannon and @ncoghlan won't mind if you start one -- look for other recent discussions about PEPs for examples. There is no thread yet for PEP 667, and PEP 558 still points to the obsolete python-dev mailing list.

    @gaogaotiantian
    Copy link
    Member

    I would ask Mark and Alyssa first whether they want to start the discussion - they definitely have a better idea of the PEPs - the background, the proposed outcome, and the pros and cons. If they are currently focusing on something else and do not mind me starting the discussion, I can start one.

    @markshannon
    Copy link
    Member

    I'm happy to restart the discussion

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 26, 2024

    The Steering Council has Accepted PEP 667, so this issue is now expected to be resolved in Python 3.13: https://discuss.python.org/t/pep-667-consistent-views-of-namespaces/46631/25

    @gaogaotiantian If you're amenable, I'm happy to cover the docs updates needed for PEP 667.

    • Python locals() docs update (gh-74929: locals() documentation update for PEP 667 #118265)
    • C API additions/deprecations/changes (TODO, I will probably make this a separate PR from the Python API one)
    • Scan docs for other references to locals() & f_locals and update if necessary (TODO, I'd likely tackle this while working on the C API docs PR)
    • First draft of a "What's New in 3.13" entry (TODO, probably in its own PR once the Python and C API docs are settled)

    @gaogaotiantian
    Copy link
    Member

    That would be a great help @ncoghlan ! Thanks for these work. I'm kind of buried in the implementation now :)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    Status: No status
    Development

    No branches or pull requests

    9 participants