This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author njs
Recipients Mark.Shannon, belopolsky, benjamin.peterson, njs, yselivanov
Date 2017-06-24.04:05:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1498277129.07.0.940069687434.issue30744@psf.upfronthosting.co.za>
In-reply-to
Content
The attached script looks innocent, but gives wildly incorrect results on all versions of CPython I've tested.

It does two things:

- spawns a thread which just loops, doing nothing

- in the main thread, repeatedly increments a variable 'x'

And most of the increments of the variable are lost!

This requires two key things I haven't mentioned, but that you wouldn't expect to change anything. First, the thread function closes over the local variable 'x' (though it doesn't touch this variable in any way). And second, the thread function has a Python-level trace function registered (but this trace function is a no-op).

Here's what's going on:

When you use sys.settrace() to install a Python-level trace function, it installs the C-level trace function "call_trampoline". And then whenever a trace event happens, call_trampoline calls PyFrame_FastToLocalsWithError, then calls the Python-level trace function, then calls PyFrame_LocalsToFast (see: https://github.com/python/cpython/blob/master/Python/sysmodule.c#L384-L395). This save/call/restore sequence is safe if all the variables being saved/restored are only visible to the current thread, which used to be true back in the days when local variables were really local. But it turns out nowadays (since, uh... python 2.1), local variables can be closed over, and thus can visible from multiple threads simultaneously.

Which means we get the following sequence of events:

- In thread A, a trace event occurs (e.g., executing a line of code)

- In thread A, call_trampoline does PyFrame_FastToLocalsWithError, which saves a snapshot of the current value of 'x'

- In thread A, call_trampoline then starts calling the trace function

- In thread B, we increment 'x'

- In thread A, the trace function returns

- In thread A, call_trampoline then does PyFrame_LocalsToFast, which restores the old value of 'x', overwriting thread B's modifications

This means that merely installing a Python-level trace function – for example, by running with 'python -m trace' or under pdb – can cause otherwise correct code to give wrong answers, in an incredibly obscure fashion.

In real life, I originally ran into this in a substantially more complicated situation involving PyPy and coverage.py and a nonlocal variable that was protected by a mutex, which you can read about here: https://bitbucket.org/pypy/pypy/issues/2591/
It turns out that since this only affects *Python*-level trace functions, merely recording coverage information using coverage.py doesn't normally trigger it, since on CPython coverage.py tries to install an accelerator module that uses a *C*-level trace function. Which is lucky for my particular case. But probably it should be fixed anyway :-).

CC'ing belopolsky as the interest area for the trace module, Mark Shannon because he seems to enjoy really pathological low-level bugs, and Benjamin and Yury as the "bytecode" interest area. I'm surprised that there's apparently no interest area for, like, "variables should retain the values they are assigned" -- apologies if I've CC'ed anyone incorrectly.
History
Date User Action Args
2017-06-24 04:05:29njssetrecipients: + njs, belopolsky, benjamin.peterson, Mark.Shannon, yselivanov
2017-06-24 04:05:29njssetmessageid: <1498277129.07.0.940069687434.issue30744@psf.upfronthosting.co.za>
2017-06-24 04:05:28njslinkissue30744 messages
2017-06-24 04:05:26njscreate