classification
Title: IO streams locking can be broken after fork() with threads
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Delgan, pitrou
Priority: normal Keywords:

Created on 2020-04-26 16:58 by Delgan, last changed 2020-04-29 16:43 by pitrou.

Files
File name Uploaded Description Edit
bug.py Delgan, 2020-04-26 16:58
Messages (8)
msg367317 - (view) Author: Delgan (Delgan) * Date: 2020-04-26 16:58
Hi.

I have a very basic program:
- one multiprocessing SimpleQueue
- one consumer thread
- one loop:
  - add one item to the queue
  - create a new process with itself add one item to the queue
  - wait for the process to end

For some unknown reason, it will hangs after some time. 

I know the docs said:

> This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.

That's why I added "time.sleep(1)" inside the process, to make sure all items added by the child process are consumed. You can remove it and the hang will happen faster.


I'm using Python 3.8.2 on Linux. Forcing program to terminate with Ctrl+C (twice):

^CTraceback (most recent call last):
  File "bug.py", line 23, in <module>
    p.join()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
msg367509 - (view) Author: Delgan (Delgan) * Date: 2020-04-28 08:33
I noticed the bug is reproducible even if the child process does not put object in the queue:




    import multiprocessing 
    import threading 
    import time 

    if __name__ == "__main__": 
      queue = multiprocessing.SimpleQueue() 

      def consume(queue):
        while True:
          print("Consumed:", queue.get())

      thread = threading.Thread(target=consume, args=(queue,))
      thread.start()
      
      for i in range(10000):
        queue.put(i)
        p = multiprocessing.Process(target=lambda: None) 
        p.start() 
        p.join()

        print("Not hanging yet", i)
msg367510 - (view) Author: Delgan (Delgan) * Date: 2020-04-28 08:58
Another curiosity: if 'print("Consumed:", queue.get())' is replaced by either 'print("Consumed")' or 'queue.get()', then the program keeps running without stopping. Only a combination of both makes the program to hang.

Also reproducible on 3.5.2.
msg367516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-04-28 10:43
I can reproduce on Ubuntu 18.04 with git master.

Here is a better example which clearly shows the issue:
https://gist.github.com/pitrou/d9784d5ec679059cd02fce4b38ea2fa6

After a few runs, you'll see that the child Process hangs when trying to flush the standard streams:
Timeout (0:00:01)!
Thread 0x00007efbff6c0080 (most recent call first):
  File "/home/antoine/cpython/default/Lib/multiprocessing/util.py", line 435 in _flush_std_streams
  File "/home/antoine/cpython/default/Lib/multiprocessing/process.py", line 335 in _bootstrap
  File "/home/antoine/cpython/default/Lib/multiprocessing/popen_fork.py", line 71 in _launch
  File "/home/antoine/cpython/default/Lib/multiprocessing/popen_fork.py", line 19 in __init__
  File "/home/antoine/cpython/default/Lib/multiprocessing/context.py", line 276 in _Popen
  File "/home/antoine/cpython/default/Lib/multiprocessing/context.py", line 224 in _Popen
  File "/home/antoine/cpython/default/Lib/multiprocessing/process.py", line 121 in start
  File "/home/antoine/cpython/default/bpo40399.py", line 25 in <module>
Child process failed!


@Delgan, mixing processes and threads is problematic with the default settings.  See here:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

"""Note that safely forking a multithreaded process is problematic."""

If you call `multiprocessing.set_start_method("forkserver")` at the start of your program, the problem will disappear.
msg367522 - (view) Author: Delgan (Delgan) * Date: 2020-04-28 12:23
Thank you for having looked into the problem.

To be more specific, I don't generally mix threads with multiprocessing, but it's a situation where there is one global and hidden consumer thread listening to a queue for non-blocking logging.

Actually, I think the problem is reproducible using the QueueListener provided in "logging.handlers". The following snippet is inspired by this part of the documentation: https://docs.python.org/3/howto/logging-cookbook.html#dealing-with-handlers-that-block

----------------------

import logging
import multiprocessing
import queue
from logging.handlers import QueueHandler, QueueListener


if __name__ == "__main__":
    que = multiprocessing.Queue()
    
    queue_handler = QueueHandler(que)
    handler = logging.StreamHandler()
    listener = QueueListener(que, handler)
    root = logging.getLogger()
    root.addHandler(queue_handler)
    listener.start()

    for i in range(10000):
        root.warning('Look out!')
        p = multiprocessing.Process(target=lambda: None)
        p.start()
        p.join()
        print("Not hanging yet", i)


    listener.stop()
msg367525 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-04-28 12:35
Well, as the documentation states, `QueueListener.start` """starts up a background thread to monitor the queue for LogRecords to process""" :-)
msg367674 - (view) Author: Delgan (Delgan) * Date: 2020-04-29 16:37
Yeah, I just wanted to illustrate the issue with a more realistic example. The thread is often abstracted away by a class or a library. Conclusion: do not abstract it away. :)

I've noticed that the mere fact of using "sys.stderr.write()", without even involving a queue, could cause the problem.

Out of curiosity: my understanding of "sys.stderr" being a "TextIOWrapper" implies it's not thread-safe. Therefore, do you have any idea of which lock is involved in this issue?
msg367675 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-04-29 16:43
The TextIOWrapper uses an underlying BufferedWriter, which is thread-safe (and therefore has an internal lock).
History
Date User Action Args
2020-04-29 16:43:20pitrousetmessages: + msg367675
2020-04-29 16:37:49Delgansetmessages: + msg367674
2020-04-28 12:38:13vstinnersetnosy: - vstinner
2020-04-28 12:35:41pitrousetmessages: + msg367525
2020-04-28 12:23:22Delgansetmessages: + msg367522
2020-04-28 10:43:10pitrousetnosy: + pitrou, vstinner
title: Program hangs if process created right after adding object to a Queue -> IO streams locking can be broken after fork() with threads
messages: + msg367516

versions: + Python 3.9
2020-04-28 08:58:50Delgansetmessages: + msg367510
2020-04-28 08:33:23Delgansetmessages: + msg367509
2020-04-26 16:58:22Delgancreate