Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent.futures.thread deadlock due to Queue in weakref callback #76757

Closed
gpshead opened this issue Jan 17, 2018 · 10 comments
Closed

concurrent.futures.thread deadlock due to Queue in weakref callback #76757

gpshead opened this issue Jan 17, 2018 · 10 comments
Labels
3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Jan 17, 2018

BPO 32576
Nosy @gpshead, @mdickinson, @pitrou, @ned-deily, @mgorny
PRs
  • bpo-32576: use queue.SimpleQueue in critical places #5216
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-01-18.09:38:23.076>
    created_at = <Date 2018-01-17.01:04:08.760>
    labels = ['type-bug', '3.7']
    title = 'concurrent.futures.thread deadlock due to Queue in weakref callback'
    updated_at = <Date 2018-04-10.05:47:15.247>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2018-04-10.05:47:15.247>
    actor = 'devurandom'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-01-18.09:38:23.076>
    closer = 'pitrou'
    components = []
    creation = <Date 2018-01-17.01:04:08.760>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32576
    keywords = ['patch']
    message_count = 10.0
    messages = ['310124', '310156', '310203', '310210', '310228', '313476', '313479', '313490', '313592', '313595']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'mark.dickinson', 'pitrou', 'ned.deily', 'devurandom', 'mgorny']
    pr_nums = ['5216']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32576'
    versions = ['Python 3.7']

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 17, 2018

    As a follow up to https://bugs.python.org/issue14976 which just introduced queue.SimpleQueue:

    concurrent.futures.thread currently uses a queue.Queue() from weakref callbacks which could in theory lead to a deadlock when periodic gc triggers a cleanup invalidating some weakrefed objects at an inopportune time while Python code has the queue's lock held.

    I don't have a test case proving this (deadlocks are hard).

    Switching it to use a SimpleQueue should avoid the problem?

    ...

    This and the C extension module based SimpleQueue would be good to backport to https://pypi.python.org/pypi/futures as well.

    @gpshead gpshead added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Jan 17, 2018
    @pitrou
    Copy link
    Member

    pitrou commented Jan 17, 2018

    We could also switch multiprocessing.Pool. Unfortunately some code in multiprocessing.Pool relies on internal details of queue.Queue (!).

    @mdickinson
    Copy link
    Member

    mdickinson commented Jan 17, 2018

    See also https://bugs.python.org/issue21009 (#65208)

    @gpshead gpshead changed the title concurrent.futures.thread potential deadlock due to Queue in weakref callback concurrent.futures.thread deadlock due to Queue in weakref callback Jan 17, 2018
    @pitrou
    Copy link
    Member

    pitrou commented Jan 17, 2018

    Thanks for the heads up Mark. Unfortunately the reproducer script https://bugs.python.org/issue21009 needs to hack into Queue.get(), which isn't possible for the C-implemented SimpleQueue.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 18, 2018

    New changeset ab74504 by Antoine Pitrou in branch 'master':
    bpo-32576: use queue.SimpleQueue in critical places (bpo-5216)
    ab74504

    @pitrou pitrou closed this as completed Jan 18, 2018
    @mgorny
    Copy link
    Mannequin

    mgorny mannequin commented Mar 9, 2018

    Could you get this fixed in earlier versions of CPython? Given that 3.7 is not yet released, having this broken in 3.5 and 3.6 is highly undesirable. This apparently seems to affect our tooling 1 and telling our users to use 3.7 beta is not an option.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 9, 2018

    Michał, sorry, I doubt it. The fix is highly non-trivial as it first requires backporting a new feature (see bpo-14976). I'm cc'ing the 3.6 branch release manager just in case.

    If apparently you're witnessing this in controlled situations (the "gemato" utility, IIUC?), one workaround would be to disable the cyclic GC (call gc.disable()). If your program doesn't create any cyclic references, or if those references don't keep too much memory alive, that would probably work.

    @ned-deily
    Copy link
    Member

    Yes, backporting all of the required code to earlier releases would be out of scope for a maintenance release, particularly at this late stage in the 3.6 life cycle. Let's see whether disabling the GC is a sufficient workaround until 3.7 is available.

    @mgorny
    Copy link
    Mannequin

    mgorny mannequin commented Mar 11, 2018

    Well, according to the reporters disabling GC doesn't help at all. Maybe it's another issue.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 11, 2018

    Perhaps the gemato issue has nothing to do with multiprocessing indeed. I would suggest add some progress logging to your program to see whether/where it actually hangs.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants