classification
Title: Avoid daemon threads in concurrent.futures
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: aeros Nosy List: aeros, pitrou, tomMoral, vstinner
Priority: normal Keywords: patch

Created on 2020-03-01 14:03 by pitrou, last changed 2020-04-02 02:53 by aeros. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19149 merged aeros, 2020-03-25 00:30
Messages (12)
msg363059 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-03-01 14:03
Since issue37266 (which forbid daemon threads in subinterpreters), we probably want to forego daemon threads in concurrent.futures.  This means we also need a way to run an atexit-like hook before non-daemon threads are joined on (sub)interpreter shutdown.

See discussion below:
https://bugs.python.org/issue37266#msg362890
msg363619 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2020-03-08 00:53
I spent some further time considering the solution to the problem, and I still think something like a `threading.register_atexit()` (see https://bugs.python.org/issue37266#msg362960) would be the most suitable.

However, I'm not certain regarding the exact details. The simplest pure Python implementation I can think of would be through a global list in `Lib/threading.py`, which is only initialized when `threading.register_atexit()` is called for the first time (to avoid unneeded overhead when threading exit functions aren't needed). In order to preserve keyword arguments, each registered function would be appended to the list as a partial. Then, in the private `_shutdown()`, each registered atexit function is called just after the main thread is finished, but just before the non-daemon threads are joined:

```
_threading_atexits = None

def register_atexit(func, *args, **kwargs):
    global _threading_atexits
    if _threading_atexits is None:
        _threading_atexits = []
    call = functools.partial(func, *args, **kwargs)
    _threading_atexits.append(call)

# [snip]

def _shutdown():
    if _main_thread._is_stopped:
        # _shutdown() was already called
        return

    # Main thread
    tlock = _main_thread._tstate_lock
    # The main thread isn't finished yet, so its thread state lock can't have
    # been released.
    assert tlock is not None
    assert tlock.locked()
    tlock.release()
    _main_thread._stop()
    
    # Call registered threading atexit functions
    for atexit_call in _threading_atexits:
        atexit_call()

    # Join all non-deamon threads
    # [snip]
```

Could something like the above pure Python implementation be adequate for our purposes? It seems like it would be to me, but I could very well be missing something. I'll of course have to test if it works as intended for replacing the daemon threads in concurrent.futures.

Another factor to consider is whether or not something like this would be widely useful enough to consider adding `threading.register_atexit()` to the public API for the threading module, or if it should just be an internal _function with a docstring. I could see it being useful in similar cases where daemon threads are no longer a viable option (due to subinterepter compatibility or any other reason). But, I suspect the demand for it won't be overly high from users until PEP 554 is completed.

Thoughts?
msg365189 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-03-27 19:31
New changeset b61b818d916942aad1f8f3e33181801c4a1ed14b by Kyle Stanley in branch 'master':
bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)
https://github.com/python/cpython/commit/b61b818d916942aad1f8f3e33181801c4a1ed14b
msg365334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 16:55
> bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)

This change introduced a leak in test_asyncio: bpo-40115.
msg365353 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 21:26
I reopen the issue since it introduced a regression on buildbots. If no fix is found soon (let's say in 1 or 2 days), I will revert the change to get more time to investigate. If buildbots remain red, we will be other regressions which would make the situation even worse.
msg365354 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 21:26
(Oops typo) If buildbots remain red, we will *miss* other regressions which would make the situation even worse.
msg365358 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2020-03-30 23:21
> This change introduced a leak in test_asyncio: bpo-40115.

Thanks for bringing attention to it Victor. It seems like a rather odd side effect, considering that PR-19149 had no C code and was internal to concurrent.futures and threading. I did not expect asyncio to be dependent on the executors using atexit and daemon threads.

I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.
msg365362 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 23:51
> I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.

Oh yes, I know that fixing such issue can be very tricky.

For example, there is no  threading._unregister_atexit() function. Maybe the callback stays alive after the test completes.
msg365434 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2020-03-31 23:01
I'm currently busy today, but I'll have some time to explore a few different potential solutions tomorrow. If I can't find anything to fix it by the end of the day, I'll prepare a PR to revert PR-19149, and re-apply it later (after more thoroughly testing to ensure it doesn't cause refleaks).

Since I authored the PR, I'd like to take responsibility for addressing the refleak, whether that's reverting the PR or adding a fix on top of it.

Would that be acceptable Victor, or will it need to be reverted sooner than tomorrow? If it needs to be done sooner, I can open a PR to revert it later tonight (in ~4 hours or so, when I'm available to do so). This the first time a patch of mine has introduced a regression, so I wanted to double check that tomorrow was okay.
msg365508 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2020-04-01 20:44
I attached a PR to bpo-40115 to address the refleak in test_asyncio: PR-19282.
msg365563 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 02:48
Kyle fixed bpo-40115, I close again this issue. Thanks Kyle for the test_asyncio fix!
msg365564 - (view) Author: Kyle Stanley (aeros) * (Python triager) Date: 2020-04-02 02:53
> Thanks Kyle for the test_asyncio fix!

No problem! Thanks for the review. :-)
History
Date User Action Args
2020-04-02 02:53:50aerossetmessages: + msg365564
2020-04-02 02:48:44vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg365563
2020-04-01 20:44:40aerossetmessages: + msg365508
2020-03-31 23:01:07aerossetmessages: + msg365434
2020-03-30 23:51:07vstinnersetmessages: + msg365362
2020-03-30 23:21:51aerossetmessages: + msg365358
2020-03-30 21:26:46vstinnersetmessages: + msg365354
2020-03-30 21:26:03vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg365353
2020-03-30 16:55:17vstinnersetnosy: + vstinner
messages: + msg365334
2020-03-27 19:31:42pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-03-27 19:31:29pitrousetmessages: + msg365189
2020-03-25 00:30:40aerossetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request18510
2020-03-08 00:55:46aerossetassignee: aeros
2020-03-08 00:53:33aerossetmessages: + msg363619
2020-03-01 14:03:30pitroucreate