classification
Title: concurrent.futures.thread deadlock due to Queue in weakref callback
Type: behavior Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: devurandom, gregory.p.smith, mark.dickinson, mgorny, ned.deily, pitrou
Priority: normal Keywords: patch

Created on 2018-01-17 01:04 by gregory.p.smith, last changed 2018-04-10 05:47 by devurandom. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5216 merged pitrou, 2018-01-17 12:10
Messages (10)
msg310124 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-01-17 01:04
As a follow up to https://bugs.python.org/issue14976 which just introduced queue.SimpleQueue:

concurrent.futures.thread currently uses a queue.Queue() from weakref callbacks which could in theory lead to a deadlock when periodic gc triggers a cleanup invalidating some weakrefed objects at an inopportune time while Python code has the queue's lock held.

I don't have a test case proving this (deadlocks are hard).

Switching it to use a SimpleQueue should avoid the problem?

...

This and the C extension module based SimpleQueue would be good to backport to https://pypi.python.org/pypi/futures as well.
msg310156 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-17 11:55
We could also switch multiprocessing.Pool.  Unfortunately some code in multiprocessing.Pool relies on internal details of queue.Queue (!).
msg310203 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-01-17 18:23
See also https://bugs.python.org/issue21009
msg310210 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-17 22:10
Thanks for the heads up Mark.  Unfortunately the reproducer script https://bugs.python.org/issue21009 needs to hack into Queue.get(), which isn't possible for the C-implemented SimpleQueue.
msg310228 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-18 09:38
New changeset ab74504346a6e2569b3255b7b621c589716888c4 by Antoine Pitrou in branch 'master':
bpo-32576: use queue.SimpleQueue in critical places (#5216)
https://github.com/python/cpython/commit/ab74504346a6e2569b3255b7b621c589716888c4
msg313476 - (view) Author: Michał Górny (mgorny) * Date: 2018-03-09 08:25
Could you get this fixed in earlier versions of CPython? Given that 3.7 is not yet released, having this broken in 3.5 and 3.6 is highly undesirable. This apparently seems to affect our tooling [1] and telling our users to use 3.7 beta is not an option.

[1]:https://bugs.gentoo.org/647964
msg313479 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-03-09 09:52
Michał, sorry, I doubt it.  The fix is highly non-trivial as it first requires backporting a new feature (see issue14976).  I'm cc'ing the 3.6 branch release manager just in case.

If apparently you're witnessing this in controlled situations (the "gemato" utility, IIUC?), one workaround would be to disable the cyclic GC (call gc.disable()).  If your program doesn't create any cyclic references, or if those references don't keep too much memory alive, that would probably work.
msg313490 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-03-09 17:39
Yes, backporting all of the required code to earlier releases would be out of scope for a maintenance release, particularly at this late stage in the 3.6 life cycle.  Let's see whether disabling the GC is a sufficient workaround until 3.7 is available.
msg313592 - (view) Author: Michał Górny (mgorny) * Date: 2018-03-11 10:14
Well, according to the reporters disabling GC doesn't help at all. Maybe it's another issue.
msg313595 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-03-11 10:51
Perhaps the gemato issue has nothing to do with multiprocessing indeed.  I would suggest add some progress logging to your program to see whether/where it actually hangs.
History
Date User Action Args
2018-04-10 05:47:15devurandomsetnosy: + devurandom
2018-03-11 10:51:55pitrousetmessages: + msg313595
2018-03-11 10:14:06mgornysetmessages: + msg313592
2018-03-09 17:39:00ned.deilysetmessages: + msg313490
2018-03-09 09:52:23pitrousetnosy: + ned.deily
messages: + msg313479
2018-03-09 08:25:30mgornysetnosy: + mgorny
messages: + msg313476
2018-01-18 09:38:23pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-01-18 09:38:10pitrousetmessages: + msg310228
2018-01-17 22:10:27pitrousetmessages: + msg310210
2018-01-17 22:04:37gregory.p.smithsettitle: concurrent.futures.thread potential deadlock due to Queue in weakref callback -> concurrent.futures.thread deadlock due to Queue in weakref callback
2018-01-17 22:03:30gregory.p.smithlinkissue21009 superseder
2018-01-17 18:23:36mark.dickinsonsetnosy: + mark.dickinson
messages: + msg310203
2018-01-17 12:10:52pitrousetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request5069
2018-01-17 11:55:57pitrousetmessages: + msg310156
2018-01-17 01:04:08gregory.p.smithcreate