classification
Title: print blocks with multiprocessing and buffered output
Type: behavior Stage: resolved
Components: Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Locks in the standard library should be sanitized on fork
View: 6721
Assigned To: Nosy List: davin, iritkatriel, moi90, pitrou
Priority: normal Keywords:

Created on 2020-08-13 09:42 by moi90, last changed 2021-06-26 11:08 by moi90. This issue is now closed.

Files
File name Uploaded Description Edit
mp_problem.py moi90, 2020-08-13 09:42
Messages (10)
msg375298 - (view) Author: Martin (moi90) * Date: 2020-08-13 09:42
I experience a problem with multiprocessing and print.

I tried to make a minimal working example, please see the attached file.

WITHOUT the offending print statement in the queue filler thread, everything works:
- pytest experiments/mp_problem.py
- pytest experiments/mp_problem.py -s
- python experiments/mp_problem.py

WITH the offending print statement, not so much:
- pytest experiments/mp_problem.py WORKS (Probably because pytest captures fd 1)
- pytest experiments/mp_problem.py -s FAILS eventually (after a couple of workers have been started)
- python experiments/mp_problem.py FAILS eventually (same).

WITH the offending print statement AND PYTHONUNBUFFERED=1, everything works again:
- pytest experiments/mp_problem.py
- pytest experiments/mp_problem.py -s
- python experiments/mp_problem.py

Environment:
Ubuntu 18.04.5 LTS
python 3.8.5 (hcff3b4d_1) on conda 4.8.3
msg375299 - (view) Author: Martin (moi90) * Date: 2020-08-13 09:57
python experiments/mp_problem.py also fails for:
- 3.8.3, 3.8.2, 3.8.1, 3.8.0
- 3.7.7
- 3.6.10
msg375300 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-08-13 10:30
Try putting a lock on the print statement. See this: https://stackoverflow.com/questions/40356200/python-printing-in-multiple-threads/
msg375350 - (view) Author: Martin (moi90) * Date: 2020-08-13 21:35
While I appreciate your suggestion, it does not help me much.
The problem that people usually have is that the output is scrambled. That is not the problem I'm dealing with.

I'm experiencing a deadlock caused by the print statement which seems like a python bug to me.

Furthermore, the problem appears in a library that is supposed to be used by other people and I have no control over their use of IO.

The particular behavior seems to be specific to using threading and multiprocessing together:
- If I use multiprocessing.dummy (multiprocessing API implemented with threads; so only a single processes are involved), my example works fine.
- If I use a process instead of a thread to fill the queue (filler = multiprocessing.Process(...); so no threads are involved), my example also works fine.
- Only if I have two threads in the main process and additional processes, the example fails.
msg376581 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-09-08 16:43
You're right, this is a different issue. I debugged it a bit and I think the race may be between your print statement and the util._flush_std_streams() in Popen.__init__() of Lib/multiprocessing/popen_fork.py.
msg376583 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-09-08 17:01
This may be relevant: https://stackoverflow.com/questions/9337711/subprocess-popen-not-thread-safe


It points to print() being not thread safe and suggests to use sys.stdout.write instead.  That worked for me with your script.
msg396528 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-06-25 22:35
Is there anything left here?

I've seen it mentioned on other issues of this sort that mixing multiprocessing and threading leads to problems. Should we document that?
msg396542 - (view) Author: Martin (moi90) * Date: 2021-06-26 06:59
Yes, I think it should at least be documented.

But then it practically says: "Do not use print in your library because it might be used in a threading context" This sounds unacceptable to me. 

It would be great to "just make it work".

> I debugged it a bit and I think the race may be between your print statement and the util._flush_std_streams()

Why would print deadlock with a flush?
msg396544 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-06-26 08:10
This is just issue6721 again.

The workaround is easy: just add `multiprocessing.set_start_method("forkserver")` at the start of your program.

Also, this is more or less documented, though quite tersely:
"""Note that safely forking a multithreaded process is problematic.""
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
msg396549 - (view) Author: Martin (moi90) * Date: 2021-06-26 11:08
Thanks for the pointer, @pitrou!
History
Date User Action Args
2021-06-26 11:08:01moi90setmessages: + msg396549
2021-06-26 08:10:31pitrousetstatus: open -> closed
versions: + Python 3.9, Python 3.10, Python 3.11
superseder: Locks in the standard library should be sanitized on fork
messages: + msg396544

resolution: duplicate
stage: resolved
2021-06-26 06:59:56moi90setmessages: + msg396542
2021-06-25 22:35:18iritkatrielsetmessages: + msg396528
2020-09-08 17:01:23iritkatrielsetmessages: + msg376583
2020-09-08 16:43:01iritkatrielsetmessages: + msg376581
2020-08-14 08:51:51ned.deilysetnosy: + pitrou, davin
2020-08-13 21:35:36moi90setmessages: + msg375350
2020-08-13 10:30:41iritkatrielsetnosy: + iritkatriel
messages: + msg375300
2020-08-13 09:57:27moi90setmessages: + msg375299
2020-08-13 09:42:10moi90create