classification
Title: Avoid daemon threads in concurrent.futures
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: aeros Nosy List: aeros, jack__d, janfrederik.konopka, pitrou, tomMoral
Priority: normal Keywords: patch

Created on 2020-03-01 14:03 by pitrou, last changed 2021-07-26 04:59 by jack__d. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19149 closed aeros, 2020-03-25 00:30
Messages (15)
msg363059 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-03-01 14:03
Since issue37266 (which forbid daemon threads in subinterpreters), we probably want to forego daemon threads in concurrent.futures.  This means we also need a way to run an atexit-like hook before non-daemon threads are joined on (sub)interpreter shutdown.

See discussion below:
https://bugs.python.org/issue37266#msg362890
msg363619 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-03-08 00:53
I spent some further time considering the solution to the problem, and I still think something like a `threading.register_atexit()` (see https://bugs.python.org/issue37266#msg362960) would be the most suitable.

However, I'm not certain regarding the exact details. The simplest pure Python implementation I can think of would be through a global list in `Lib/threading.py`, which is only initialized when `threading.register_atexit()` is called for the first time (to avoid unneeded overhead when threading exit functions aren't needed). In order to preserve keyword arguments, each registered function would be appended to the list as a partial. Then, in the private `_shutdown()`, each registered atexit function is called just after the main thread is finished, but just before the non-daemon threads are joined:

```
_threading_atexits = None

def register_atexit(func, *args, **kwargs):
    global _threading_atexits
    if _threading_atexits is None:
        _threading_atexits = []
    call = functools.partial(func, *args, **kwargs)
    _threading_atexits.append(call)

# [snip]

def _shutdown():
    if _main_thread._is_stopped:
        # _shutdown() was already called
        return

    # Main thread
    tlock = _main_thread._tstate_lock
    # The main thread isn't finished yet, so its thread state lock can't have
    # been released.
    assert tlock is not None
    assert tlock.locked()
    tlock.release()
    _main_thread._stop()
    
    # Call registered threading atexit functions
    for atexit_call in _threading_atexits:
        atexit_call()

    # Join all non-deamon threads
    # [snip]
```

Could something like the above pure Python implementation be adequate for our purposes? It seems like it would be to me, but I could very well be missing something. I'll of course have to test if it works as intended for replacing the daemon threads in concurrent.futures.

Another factor to consider is whether or not something like this would be widely useful enough to consider adding `threading.register_atexit()` to the public API for the threading module, or if it should just be an internal _function with a docstring. I could see it being useful in similar cases where daemon threads are no longer a viable option (due to subinterepter compatibility or any other reason). But, I suspect the demand for it won't be overly high from users until PEP 554 is completed.

Thoughts?
msg365189 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-03-27 19:31
New changeset b61b818d916942aad1f8f3e33181801c4a1ed14b by Kyle Stanley in branch 'master':
bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)
https://github.com/python/cpython/commit/b61b818d916942aad1f8f3e33181801c4a1ed14b
msg365334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 16:55
> bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)

This change introduced a leak in test_asyncio: bpo-40115.
msg365353 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 21:26
I reopen the issue since it introduced a regression on buildbots. If no fix is found soon (let's say in 1 or 2 days), I will revert the change to get more time to investigate. If buildbots remain red, we will be other regressions which would make the situation even worse.
msg365354 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 21:26
(Oops typo) If buildbots remain red, we will *miss* other regressions which would make the situation even worse.
msg365358 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-03-30 23:21
> This change introduced a leak in test_asyncio: bpo-40115.

Thanks for bringing attention to it Victor. It seems like a rather odd side effect, considering that PR-19149 had no C code and was internal to concurrent.futures and threading. I did not expect asyncio to be dependent on the executors using atexit and daemon threads.

I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.
msg365362 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 23:51
> I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.

Oh yes, I know that fixing such issue can be very tricky.

For example, there is no  threading._unregister_atexit() function. Maybe the callback stays alive after the test completes.
msg365434 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-03-31 23:01
I'm currently busy today, but I'll have some time to explore a few different potential solutions tomorrow. If I can't find anything to fix it by the end of the day, I'll prepare a PR to revert PR-19149, and re-apply it later (after more thoroughly testing to ensure it doesn't cause refleaks).

Since I authored the PR, I'd like to take responsibility for addressing the refleak, whether that's reverting the PR or adding a fix on top of it.

Would that be acceptable Victor, or will it need to be reverted sooner than tomorrow? If it needs to be done sooner, I can open a PR to revert it later tonight (in ~4 hours or so, when I'm available to do so). This the first time a patch of mine has introduced a regression, so I wanted to double check that tomorrow was okay.
msg365508 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-01 20:44
I attached a PR to bpo-40115 to address the refleak in test_asyncio: PR-19282.
msg365563 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 02:48
Kyle fixed bpo-40115, I close again this issue. Thanks Kyle for the test_asyncio fix!
msg365564 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-02 02:53
> Thanks Kyle for the test_asyncio fix!

No problem! Thanks for the review. :-)
msg391996 - (view) Author: Jan Konopka (janfrederik.konopka) Date: 2021-04-26 23:08
Hi all!

While browsing StackOverflow I came across this question: https://stackoverflow.com/q/67273533/2111778

The user created a ThreadPoolExecutor which started a Process using multiprocessing. The Process produces an exitcode of 0 in Python 3.8 but an exitcode of 1 in Python 3.9.

I'm really not familiar with Python internals, but through monkey-patching Lib/concurrent/futures/thread.py I was able to pin this regression down to the change of 
> atexit.register(_python_exit)
to
> threading._register_atexit(_python_exit)
which led me to this issue! (:

I know that multiprocessing in Python is a little funky, since I worked with it on my Master's thesis. I think the entire process gets forked (i.e. copied), so the child process also gets a copy of the active ThreadPoolExecutor, which I think causes some sort of problem there. Note that this behavior seems to differ based on OS. I can confirm the issue on Linux with the 'fork' method and disconfirm it with the 'spawn' and 'forkserver' methods.
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Could someone with more insight kindly take a look at this?

Greetings Jan <3
msg392041 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-04-27 08:24
@Jan, without taking a look, I'd answer that indeed you should avoid using the "fork" method if you're doing any kind of multithreading in the parent process.  "forkserver" is a good choice nowadays on Linux and will result in more robust code.
msg398211 - (view) Author: Jack DeVries (jack__d) * Date: 2021-07-26 04:59
The regression that @janfrederik.konopka points out also has it's own open issue: bpo-43944. 

I'm trying to work on a fix for this regression. Slowly but surely. Now I've finally found these threads, this information will be a big help! Any tips are appreciated.
History
Date User Action Args
2021-07-26 04:59:12jack__dsetnosy: + jack__d
messages: + msg398211
2021-04-27 08:32:30vstinnersetnosy: - vstinner
2021-04-27 08:24:36pitrousetmessages: + msg392041
2021-04-26 23:08:53janfrederik.konopkasetnosy: + janfrederik.konopka
messages: + msg391996
2020-04-02 02:53:50aerossetmessages: + msg365564
2020-04-02 02:48:44vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg365563
2020-04-01 20:44:40aerossetmessages: + msg365508
2020-03-31 23:01:07aerossetmessages: + msg365434
2020-03-30 23:51:07vstinnersetmessages: + msg365362
2020-03-30 23:21:51aerossetmessages: + msg365358
2020-03-30 21:26:46vstinnersetmessages: + msg365354
2020-03-30 21:26:03vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg365353
2020-03-30 16:55:17vstinnersetnosy: + vstinner
messages: + msg365334
2020-03-27 19:31:42pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-03-27 19:31:29pitrousetmessages: + msg365189
2020-03-25 00:30:40aerossetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request18510
2020-03-08 00:55:46aerossetassignee: aeros
2020-03-08 00:53:33aerossetmessages: + msg363619
2020-03-01 14:03:30pitroucreate