Title: concurrent.futures ProcessPoolExecutor submit() blocks on results being written
Messages (6)
I have tracked down the exact cause of a sizable performance issue in using concurrent.futures.ProcessPoolExecutors, especially visible in cases where large amounts of data are being copied across the result.

The line-number causing the bad behavior, and several remediation paths are included below. Since this affects core behavior of the module, I'm reticent to try out a patch myself unless someone chimes in on the approach.

---Bug Symptoms:
  ProcessPoolExecutor.submit() hangs for long periods of time non-deterministically (over 20 seconds in my job). See causes section below for exact cause. 
   This hanging makes multiprocess job submissions impossible from a real-time constrained main thread, where the results are large objects.

---Ideal behavior:
   submit() should not block on any results of other jobs, and non-blocking wake signal should be used instead of a blocking put() call.

---Bug Cause:
In ProcessPoolExecutor.submit() line 473, a wake signal is being sent to the management thread in the form of posting a message to the result queue, waking the thread if it was in recv() mode.

I'm not even sure that this wake-up is necessary, as removing it seems to work just fine for my use-case on OSX. However, let's presume that it is for the time being..

The fact that submit() blocks on the result_queue being serviced is unnecessary, and hinders large results from being sent back across in concurrent.futures.result().

---Possible remediations:

If a more fully-fledged Queue implementation were used, this signal could be replaced by the non-blocking version. Alternately multiprocess.Queue implementation could be extended to implement non-blocking put()

--- Reproduction Details
  I'm using concurrent.futures.ProcessPoolExecutor for a complicated data-processing use-case where the result is a large object to be sent across the result() channel. Create any such setup where the results are on the order of 50MB strings, submit 5-10 jobs at a time, and watch the time it takes to call submit().
Line number was incorrect due to local edits. 

Correct line number is  "self._result_queue.put(None)"
adding experts bquinlan and pitrou for concurrent.futures to nosy-list as per bug tracker directions.
I'm not sure what happens exactly in your workload, but waiting 20 seconds when posting some data on an unbounded queue sounds enormous.
This behavior results from the fact that in 3.6, the result_queue is used to pass messages to the queue_manager_thread. This behavior has been changed in 3.7 as we rely on a _ThreadWakeup object.

In 3.6, when the result_queue is filled with many large objects, the call to result_queue.put(None) will hang while the previous objects are being handled by the queue_manager_thread, causing a latency in the submit.
Just got the drop of the python3.7 release. I can confirm that this is fixed in python3.7 in my workload.

Nice job! Thanks for changing the mechanism of thread-sync. I'm grateful.
