classification
Title: Process not exiting on unhandled exception when using multiprocessing module
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: akhi singhania, pablogsal
Priority: normal Keywords:

Created on 2018-10-31 11:16 by akhi singhania, last changed 2018-12-09 22:58 by pablogsal. This issue is now closed.

Files
File name Uploaded Description Edit
example.py akhi singhania, 2018-10-31 11:16
Messages (3)
msg328983 - (view) Author: akhi singhania (akhi singhania) Date: 2018-10-31 11:16
I am not sure if this is an implementation bug or just a documentation bug.  When using the multiprocessing module, I have come across a scenario where the process fails to exit when it throws an unhandled exception because it is waiting for the feeder thread to join forever.  Sending SIGINT doesn't cause the process to exit either but sending SIGTERM does cause it to exit.

I have attached a simple reproducer.

When main() raises the unhandled exception, the process does not exit.  However, if the size of data that is enqueued is reduced or the child process closes the queue on exiting, then the process exits fine.  

In the scenario, when the process exits successfully, I see the following output:

**** creating queue
[DEBUG/MainProcess] created semlock with handle 140197742751744
[DEBUG/MainProcess] created semlock with handle 140197742747648
[DEBUG/MainProcess] created semlock with handle 140197742743552
[DEBUG/MainProcess] Queue._after_fork()
**** created queue
**** creating process
**** starting process
**** started process
**** starting enqueue
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
**** done enqueue
**** starting sleep
**** done sleep
Traceback (most recent call last):
  File "example.py", line 58, in <module>
    main()
  File "example.py", line 54, in main
    raise Exception('foo')
Exception: foo
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] telling queue thread to quit
[DEBUG/MainProcess] running the remaining "atexit" finalizers
[DEBUG/MainProcess] joining queue thread
[DEBUG/MainProcess] feeder thread got sentinel -- exiting
[DEBUG/MainProcess] ... queue thread joined


In the scenario when the process does not exit successfully, I see the following output:

**** creating queue
[DEBUG/MainProcess] created semlock with handle 139683574689792
[DEBUG/MainProcess] created semlock with handle 139683574685696
[DEBUG/MainProcess] created semlock with handle 139683574681600
[DEBUG/MainProcess] Queue._after_fork()
**** created queue
**** creating process
**** starting process
**** started process
**** starting enqueue
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
**** done enqueue
**** starting sleep
**** done sleep
Traceback (most recent call last):
  File "example.py", line 58, in <module>
    main()
  File "example.py", line 54, in main
    raise Exception('foo')
Exception: foo
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] telling queue thread to quit
[DEBUG/MainProcess] running the remaining "atexit" finalizers
[DEBUG/MainProcess] joining queue thread
<<<< Process hangs here >>>>



I found the "solution" of closing the queue in the child by trial and error and looking through the code.  The current documentation suggests that multiprocessing.Queue.close() and multiprocessing.Queue.join_thread() are "usually unnecessary for most code".  I am not sure if the attached code can be classified as normal code.  I believe that at the very least, the documentation should be updated or maybe it should be investigated if some code changes can address this.
msg329012 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-10-31 20:11
Unless I don't understand the issue correctly, this is documented here:

https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming

Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the Queue.cancel_join_thread method of the queue to avoid this behaviour.)

This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.


In you example, if you add:

ch._queue.get()

before raising the exception, the program does not hang anymore once the item is taken out of the queue.
msg329054 - (view) Author: akhi singhania (akhi singhania) Date: 2018-11-01 11:15
Thank you very much for the reply and the link.  It seems like I escaped that bit in the documentation, my apologises.  I can confirm that using cancel_join_thread() removes the need for explicitly calling queue.close().

May I please ask for some more clarification if you do not mind.  My understanding now is that, there are two scenarios to consider when a process using queues tries to exit:

- The default behaviour seems to be that the process must flush the queue before it exits.  This is useful as it will ensure that none of the queued data is lost which can be very useful in some circumstances.

- The alternate behaviour (which can be enabled by setting cancel_join_thread()) is that you don't care about losing the data in the queue and just want to exit.  Again this can be useful in some circumstances if you don't care if the data is lost and emptying out the queue might potentially take a long time.


Does the above sound about right?  Thank you very much for your explanation and sorry again for the noise.
History
Date User Action Args
2018-12-09 22:58:26pablogsalsetstatus: open -> closed
resolution: not a bug
stage: resolved
2018-11-01 11:15:13akhi singhaniasetmessages: + msg329054
2018-10-31 20:11:34pablogsalsetnosy: + pablogsal
messages: + msg329012
2018-10-31 11:16:38akhi singhaniacreate