Issue 33945: concurrent.futures ProcessPoolExecutor submit() blocks on results being written

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78126

classification

Title:	concurrent.futures ProcessPoolExecutor submit() blocks on results being written
Type:	performance	Stage:	resolved
Components:	Extension Modules	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	bquinlan, dbarcay, pitrou, tomMoral
Priority:	normal	Keywords:

Created on 2018-06-22 19:38 by dbarcay, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg320257 - (view)	Author: Daniel Barcay (dbarcay)	Date: 2018-06-22 19:38
I have tracked down the exact cause of a sizable performance issue in using concurrent.futures.ProcessPoolExecutors, especially visible in cases where large amounts of data are being copied across the result. The line-number causing the bad behavior, and several remediation paths are included below. Since this affects core behavior of the module, I'm reticent to try out a patch myself unless someone chimes in on the approach. ---Bug Symptoms: ProcessPoolExecutor.submit() hangs for long periods of time non-deterministically (over 20 seconds in my job). See causes section below for exact cause. This hanging makes multiprocess job submissions impossible from a real-time constrained main thread, where the results are large objects. ---Ideal behavior: submit() should not block on any results of other jobs, and non-blocking wake signal should be used instead of a blocking put() call. ---Bug Cause: In ProcessPoolExecutor.submit() line 473, a wake signal is being sent to the management thread in the form of posting a message to the result queue, waking the thread if it was in recv() mode. I'm not even sure that this wake-up is necessary, as removing it seems to work just fine for my use-case on OSX. However, let's presume that it is for the time being.. The fact that submit() blocks on the result_queue being serviced is unnecessary, and hinders large results from being sent back across in concurrent.futures.result(). ---Possible remediations: If a more fully-fledged Queue implementation were used, this signal could be replaced by the non-blocking version. Alternately multiprocess.Queue implementation could be extended to implement non-blocking put() --- Reproduction Details I'm using concurrent.futures.ProcessPoolExecutor for a complicated data-processing use-case where the result is a large object to be sent across the result() channel. Create any such setup where the results are on the order of 50MB strings, submit 5-10 jobs at a time, and watch the time it takes to call submit().
msg320258 - (view)	Author: Daniel Barcay (dbarcay)	Date: 2018-06-22 19:42
Line number was incorrect due to local edits. Correct line number is process.py:L464 "self._result_queue.put(None)"
msg320259 - (view)	Author: Daniel Barcay (dbarcay)	Date: 2018-06-22 19:56
adding experts bquinlan and pitrou for concurrent.futures to nosy-list as per bug tracker directions.
msg320377 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2018-06-24 17:50
I'm not sure what happens exactly in your workload, but waiting 20 seconds when posting some data on an unbounded queue sounds enormous.
msg320378 - (view)	Author: Thomas Moreau (tomMoral) *	Date: 2018-06-24 18:09
This behavior results from the fact that in 3.6, the result_queue is used to pass messages to the queue_manager_thread. This behavior has been changed in 3.7 as we rely on a _ThreadWakeup object. In 3.6, when the result_queue is filled with many large objects, the call to result_queue.put(None) will hang while the previous objects are being handled by the queue_manager_thread, causing a latency in the submit.
msg320688 - (view)	Author: Daniel Barcay (dbarcay)	Date: 2018-06-28 22:30
Just got the drop of the python3.7 release. I can confirm that this is fixed in python3.7 in my workload. Nice job! Thanks for changing the mechanism of thread-sync. I'm grateful.

History
Date	User	Action	Args
2022-04-11 14:59:02	admin	set	github: 78126
2018-06-28 22:30:04	dbarcay	set	status: open -> closed resolution: fixed messages: + msg320688 stage: resolved
2018-06-24 18:09:16	tomMoral	set	messages: + msg320378
2018-06-24 17:50:06	pitrou	set	nosy: + tomMoral messages: + msg320377
2018-06-22 19:56:58	dbarcay	set	nosy: + bquinlan, pitrou messages: + msg320259
2018-06-22 19:42:27	dbarcay	set	messages: + msg320258
2018-06-22 19:38:44	dbarcay	create