classification
Title: If threading is not imported from the main thread it sees the wrong thread as the main thread.
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder: MainThread association logic is fragile
View: 31517
Assigned To: Nosy List: aldwinaldwin, eric.snow, fabioz, int19h, pitrou, vstinner
Priority: normal Keywords:

Created on 2019-06-26 18:04 by fabioz, last changed 2019-07-03 21:41 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
snippet.py fabioz, 2019-06-26 18:04
Messages (9)
msg346653 - (view) Author: Fabio Zadrozny (fabioz) Date: 2019-06-26 18:04
I'm attaching a snippet which shows the issue (i.e.: threading.main_thread() and threading.current_thread() should be the same and they aren't).

What I'd see as a possible solution is that the initial thread ident would be stored when the interpreter is initialized and then when threading is imported the first time it would get that indent to initialize the main thread instead of calling `threading._MainThread._set_ident` in the wrong thread.

I'm not sure if this is possible if CPython is embedded in some other C++ program, but it seems to be the correct approach when Python is called from the command line.

As a note, I found this when doing an attach to pid for the `pydevd` debugger where a thread is created to initialize the debugger (the issue on the debugger is reported at: https://github.com/microsoft/ptvsd/issues/1542).
msg346720 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-06-27 09:23
The second import actually doesn't happen. You need to reload it. This can be tested by putting print('loading threading') in threading.py.

Your first method-thread will still think it's main thread. So no idea if this is a bug or wrong use. 'import threading' should be one of the first lines in your main code/thread?


import _thread
import time
import importlib
barrier = 0

def method():
    import threading  # Will make threading the wrong thread.
    global barrier
    print(threading.main_thread())
    print(threading.current_thread())
    barrier = 1

_thread.start_new_thread(method, ())

while barrier != 1:
    time.sleep(.1)

import threading
importlib.reload(threading)
print(threading.main_thread())
print(threading.current_thread())
msg346723 - (view) Author: Pavel Minaev (int19h) Date: 2019-06-27 10:06
This is a bit tricky to explain... There's no easy way to achieve this effect "normally". It manifests due to the way some Python debuggers (specifically, pydevd and ptvsd - as used by PyCharm, PyDev, and VSCode) implement non-cooperative attaching to a running Python process by PID.

A TL;DR take is that those debuggers have to inject a new thread into a running process from the outside, and then run some Python code on that thread. There are OS APIs for such thread injection - e.g. CreateRemoteThread on Windows. There are various tricks that they then have to use to safely acquire GIL and invoke PyEval_InitThreads, but ultimately it comes down to running Python code.

That is the point where this can manifest. Basically, as soon as that injected code (i.e. the actual debugger) imports threading, things break. And advanced debuggers do need background threads for some functionality...

Here are the technical details - i.e. how thread injection happens exactly, and what kind of code it might run - if you're interested.
https://github.com/microsoft/ptvsd/issues/1542

I think that a similar problem can also occur in an embedded Python scenario with multithreading. Consider what happens if the hosted interpreter is initialized from the main thread of the host app - but some Python code is then run from the background thread, and that code happens to be the first in the process to import threading. Then that background thread becomes the "main thread" for threading, with the same results as described above.

The high-level problem, I think, is that there's an inconsistency between what Python itself considers "main thread" (i.e. main_thread in ceval.c, as set by PyEval_InitThreads), and what threading module considers "main thread" (i.e. _main_thread in threading.py). Logically, these should be in sync.

If PyEval_InitThreads is the source of truth, then the existing thread injection technique will "just work" as implemented already in all the aforementioned debuggers. It makes sure to invoke PyEval_InitThreads via Py_AddPendingCall, rather than directly from the background thread, precisely so that the interpreter doesn't get confused. 

Furthermore, on 3.7+, PyEval_InitThreads is already automatically invoked by Py_Initialize, and hence when used by python.exe, will mark the actual first thread in the process as the main thread. So, using it a the source of truth would guarantee that attach by thread injection works correctly in all non-embedded Python scenarios.

Apps hosting Python would still need to ensure that they always call Py_Initialize on what they want to be the main thread, as they already have to do; but they wouldn't need to worry about "import threading" anymore.
msg346724 - (view) Author: Pavel Minaev (int19h) Date: 2019-06-27 10:24
It's also possible to hit if using some native code that starts a background thread without going via threading, and runs Python code on that background thread. In that case, if that Python code then does "import threading", and threading hasn't been imported yet, then you have this same problem.

Here's a pure Python repro using ctypes on Win32:

#--------------------------
import sys, time
from ctypes import *

ThreadProc = WINFUNCTYPE(c_uint32, c_void_p)

@ThreadProc
def thread_proc(_):
    import threading
    print(threading.current_thread() is threading.main_thread())
    return 0

assert "threading" not in sys.modules
#import threading  # uncomment to fix

windll.kernel32.CreateThread(None, c_size_t(0), thread_proc, None, c_uint32(0), None)
time.sleep(1)

assert "threading" in sys.modules
import threading
print(threading.current_thread() is threading.main_thread())
#--------------------------

Here's the output:

>py -3 main.py
True
False
Exception ignored in: <module 'threading' from 'C:\\Python\\3.7-64\\lib\\threading.py'>
Traceback (most recent call last):
  File "C:\Python\3.7-64\lib\threading.py", line 1276, in _shutdown
    assert tlock.locked()
AssertionError:
msg346726 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-06-27 10:46
Understood. Thank you for the extra info. I'll read up on all the suggestions.
msg346746 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-06-27 15:48
Note that in Python 3.7 we consolidated a bunch of the global runtime state.  The "main_thread" static in ceval.c moved to the internal struct _PyRuntimestate and in 3.7 it is accessible as _Runtime.ceval.pending.main_thread (but only if you build with Py_BUILD_CORE, which extensions shouldn't).  In 3.8 it was moved to the top of the struct as _PyRuntimeState.main_thread.  All that said, that thread ID is still not exposed directly in Python.

So perhaps it would make sense to add something like sys.get_main_thread_id, which Lib/threading.py could then use (rather than assuming the current (during import) thread is the main thread).

Unfortunately, this wouldn't help with anything earlier than 3.9.  Currently 3.8 is already in beta1, where we apply a feature freeze, so it's barely too late for 3.8.  I suppose you could ask for a special exception from the release manager, given the change would be relatively small, with real benefits, but don't expect it. :)
msg346747 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-06-27 15:49
FWIW, this might also have implications for thread shutdown during runtime/interpreter finalization.
msg346750 - (view) Author: Pavel Minaev (int19h) Date: 2019-06-27 15:58
Debuggers will have to work around this in past Python versions that they support (which still includes Python 2 for pretty much all of them), so this is solely about resolving the inconsistency for the future. No point rushing it for 3.8 specifically.

(The most likely immediate workaround will be that, instead of invoking PyEval_InitThreads on the main thread via Py_AddPendingCall, we will simply use the same facility to exec "import threading" on the main thread.)
msg347241 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-07-03 21:41
This is a duplicate of #31517.
History
Date User Action Args
2019-07-03 21:41:08pitrousetstatus: open -> closed

superseder: MainThread association logic is fragile

nosy: + pitrou
messages: + msg347241
resolution: duplicate
stage: resolved
2019-06-27 22:01:45brett.cannonsetnosy: - brett.cannon
2019-06-27 15:58:12int19hsetmessages: + msg346750
2019-06-27 15:49:05eric.snowsetnosy: + vstinner
messages: + msg346747
2019-06-27 15:48:41eric.snowsetnosy: + eric.snow, brett.cannon

messages: + msg346746
versions: + Python 3.8, Python 3.9, - Python 3.7
2019-06-27 10:46:30aldwinaldwinsetmessages: + msg346726
2019-06-27 10:24:24int19hsetmessages: + msg346724
2019-06-27 10:06:32int19hsetmessages: + msg346723
2019-06-27 09:23:38aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg346720
2019-06-27 07:30:36int19hsetnosy: + int19h
2019-06-26 18:04:42fabiozcreate