Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid daemon threads in concurrent.futures #83993

Closed
pitrou opened this issue Mar 1, 2020 · 15 comments
Closed

Avoid daemon threads in concurrent.futures #83993

pitrou opened this issue Mar 1, 2020 · 15 comments
Assignees
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@pitrou
Copy link
Member

pitrou commented Mar 1, 2020

BPO 39812
Nosy @pitrou, @MojoVampire, @tomMoral, @aeros, @jdevries3133
PRs
  • bpo-39812: Remove daemon threads in concurrent.futures #19149
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/aeros'
    closed_at = <Date 2020-04-02.02:48:44.826>
    created_at = <Date 2020-03-01.14:03:30.927>
    labels = ['type-bug', 'library', '3.9']
    title = 'Avoid daemon threads in concurrent.futures'
    updated_at = <Date 2022-04-06.15:15:35.785>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2022-04-06.15:15:35.785>
    actor = 'josh.r'
    assignee = 'aeros'
    closed = True
    closed_date = <Date 2020-04-02.02:48:44.826>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2020-03-01.14:03:30.927>
    creator = 'pitrou'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39812
    keywords = ['patch']
    message_count = 15.0
    messages = ['363059', '363619', '365189', '365334', '365353', '365354', '365358', '365362', '365434', '365508', '365563', '365564', '391996', '392041', '398211']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'josh.r', 'tomMoral', 'aeros', 'janfrederik.konopka', 'jack__d']
    pr_nums = ['19149']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39812'
    versions = ['Python 3.9']

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 1, 2020

    Since bpo-37266 (which forbid daemon threads in subinterpreters), we probably want to forego daemon threads in concurrent.futures. This means we also need a way to run an atexit-like hook before non-daemon threads are joined on (sub)interpreter shutdown.

    See discussion below:
    https://bugs.python.org/issue37266#msg362890

    @pitrou pitrou added 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 1, 2020
    @aeros
    Copy link
    Contributor

    aeros commented Mar 8, 2020

    I spent some further time considering the solution to the problem, and I still think something like a threading.register_atexit() (see https://bugs.python.org/issue37266#msg362960) would be the most suitable.

    However, I'm not certain regarding the exact details. The simplest pure Python implementation I can think of would be through a global list in [Lib/threading.py](https://github.com/python/cpython/blob/main/Lib/threading.py), which is only initialized when threading.register_atexit() is called for the first time (to avoid unneeded overhead when threading exit functions aren't needed). In order to preserve keyword arguments, each registered function would be appended to the list as a partial. Then, in the private _shutdown(), each registered atexit function is called just after the main thread is finished, but just before the non-daemon threads are joined:

    _threading_atexits = None
    
    def register_atexit(func, *args, **kwargs):
        global _threading_atexits
        if _threading_atexits is None:
            _threading_atexits = []
        call = functools.partial(func, *args, **kwargs)
        _threading_atexits.append(call)
    
    # [snip]
    
    def _shutdown():
        if _main_thread._is_stopped:
            # _shutdown() was already called
            return
    
        # Main thread
        tlock = _main_thread._tstate_lock
        # The main thread isn't finished yet, so its thread state lock can't have
        # been released.
        assert tlock is not None
        assert tlock.locked()
        tlock.release()
        _main_thread._stop()
        
        # Call registered threading atexit functions
        for atexit_call in _threading_atexits:
            atexit_call()
    
        # Join all non-deamon threads
        # [snip]
    

    Could something like the above pure Python implementation be adequate for our purposes? It seems like it would be to me, but I could very well be missing something. I'll of course have to test if it works as intended for replacing the daemon threads in concurrent.futures.

    Another factor to consider is whether or not something like this would be widely useful enough to consider adding threading.register_atexit() to the public API for the threading module, or if it should just be an internal _function with a docstring. I could see it being useful in similar cases where daemon threads are no longer a viable option (due to subinterepter compatibility or any other reason). But, I suspect the demand for it won't be overly high from users until PEP-554 is completed.

    Thoughts?

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 27, 2020

    New changeset b61b818 by Kyle Stanley in branch 'master':
    bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)
    b61b818

    @pitrou pitrou closed this as completed Mar 27, 2020
    @pitrou pitrou closed this as completed Mar 27, 2020
    @vstinner
    Copy link
    Member

    bpo-39812: Remove daemon threads in concurrent.futures (GH-19149)

    This change introduced a leak in test_asyncio: bpo-40115.

    @vstinner
    Copy link
    Member

    I reopen the issue since it introduced a regression on buildbots. If no fix is found soon (let's say in 1 or 2 days), I will revert the change to get more time to investigate. If buildbots remain red, we will be other regressions which would make the situation even worse.

    @vstinner vstinner reopened this Mar 30, 2020
    @vstinner vstinner reopened this Mar 30, 2020
    @vstinner
    Copy link
    Member

    (Oops typo) If buildbots remain red, we will *miss* other regressions which would make the situation even worse.

    @aeros
    Copy link
    Contributor

    aeros commented Mar 30, 2020

    This change introduced a leak in test_asyncio: bpo-40115.

    Thanks for bringing attention to it Victor. It seems like a rather odd side effect, considering that PR-19149 had no C code and was internal to concurrent.futures and threading. I did not expect asyncio to be dependent on the executors using atexit and daemon threads.

    I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.

    @vstinner
    Copy link
    Member

    I'll see if I can figure it out, but this one might be a bit tricky to troubleshoot.

    Oh yes, I know that fixing such issue can be very tricky.

    For example, there is no threading._unregister_atexit() function. Maybe the callback stays alive after the test completes.

    @aeros
    Copy link
    Contributor

    aeros commented Mar 31, 2020

    I'm currently busy today, but I'll have some time to explore a few different potential solutions tomorrow. If I can't find anything to fix it by the end of the day, I'll prepare a PR to revert PR-19149, and re-apply it later (after more thoroughly testing to ensure it doesn't cause refleaks).

    Since I authored the PR, I'd like to take responsibility for addressing the refleak, whether that's reverting the PR or adding a fix on top of it.

    Would that be acceptable Victor, or will it need to be reverted sooner than tomorrow? If it needs to be done sooner, I can open a PR to revert it later tonight (in ~4 hours or so, when I'm available to do so). This the first time a patch of mine has introduced a regression, so I wanted to double check that tomorrow was okay.

    @aeros
    Copy link
    Contributor

    aeros commented Apr 1, 2020

    I attached a PR to bpo-40115 to address the refleak in test_asyncio: PR-19282.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    Kyle fixed bpo-40115, I close again this issue. Thanks Kyle for the test_asyncio fix!

    @aeros
    Copy link
    Contributor

    aeros commented Apr 2, 2020

    Thanks Kyle for the test_asyncio fix!

    No problem! Thanks for the review. :-)

    @janfrederikkonopka
    Copy link
    Mannequin

    janfrederikkonopka mannequin commented Apr 26, 2021

    Hi all!

    While browsing StackOverflow I came across this question: https://stackoverflow.com/q/67273533/2111778

    The user created a ThreadPoolExecutor which started a Process using multiprocessing. The Process produces an exitcode of 0 in Python 3.8 but an exitcode of 1 in Python 3.9.

    I'm really not familiar with Python internals, but through monkey-patching Lib/concurrent/futures/thread.py I was able to pin this regression down to the change of

    atexit.register(_python_exit)
    to
    threading._register_atexit(_python_exit)
    which led me to this issue! (:

    I know that multiprocessing in Python is a little funky, since I worked with it on my Master's thesis. I think the entire process gets forked (i.e. copied), so the child process also gets a copy of the active ThreadPoolExecutor, which I think causes some sort of problem there. Note that this behavior seems to differ based on OS. I can confirm the issue on Linux with the 'fork' method and disconfirm it with the 'spawn' and 'forkserver' methods.
    https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

    Could someone with more insight kindly take a look at this?

    Greetings Jan <3

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 27, 2021

    @jan, without taking a look, I'd answer that indeed you should avoid using the "fork" method if you're doing any kind of multithreading in the parent process. "forkserver" is a good choice nowadays on Linux and will result in more robust code.

    @jdevries3133
    Copy link
    Mannequin

    jdevries3133 mannequin commented Jul 26, 2021

    The regression that @janfrederik.konopka points out also has it's own open issue: bpo-43944.

    I'm trying to work on a fix for this regression. Slowly but surely. Now I've finally found these threads, this information will be a big help! Any tips are appreciated.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants