msg269593 - (view) |
Author: Martin Ritter (Martin Ritter) |
Date: 2016-06-30 16:23 |
When creating a multiprocessing.Process in a threaded environment I get deadlocks waiting, I guess waiting for the lock to flush the output.
I attached a minimal example of the problem which hangs for me starting with 4 threads.
|
msg269594 - (view) |
Author: Martin Ritter (Martin Ritter) |
Date: 2016-06-30 16:25 |
I attached a gdb backtrace of one of the child processes
|
msg269603 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2016-06-30 17:11 |
Mixing multiprocessing and threading is problem prone in general. Hopefully one of the multiprocessing experts can say if this is a known problem or not...
|
msg269613 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-06-30 18:25 |
It is in-fact problem prone (and not just in Python). The rule is "thread after you fork, not before". Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock.
|
msg269727 - (view) |
Author: Davin Potts (davin) * |
Date: 2016-07-02 18:54 |
It would be nice to find an appropriate place to document the solid general guidance Raymond provided; though merely mentioning it somewhere in the docs will not translate into it being noticed. Not sure where to put it just yet...
Martin: Is there a specific situation that prompted your discovering this behavior? Mixing the spinning up of threads with the forking of processes requires appropriate planning to avoid problems and achieve desired performance. If you have a thoughtful design to your code but are still triggering problems, can you share more of the motivation?
As a side note, this is more appropriately labeled as a 'behavior' rather than a 'crash' -- the Python executable does not crash in any way but merely hangs in an apparent lock contention.
|
msg269734 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-07-02 20:18 |
FWIW, this isn't even a Python specific behavior. It is just how threads, locks, and processes work (or in this case don't work). The code is doing what it is told to do which happens to not be what you want (i.e. a user bug rather than a Python bug).
I think a FAQ entry would be a reasonable place to mention this (it comes up more often than one would hope).
|
msg269785 - (view) |
Author: Martin Ritter (Martin Ritter) |
Date: 2016-07-04 13:18 |
I agree that this is error prone and can not be fixed reliably on the python side. However, python makes it very easy to mix these two, a user might not even notice it if a function he calls uses fork and thus just use a ThreadPoolExecutor() because it's the simplest thing to do.
What could be an nice solution in my opinion if the multiprocessing module could check if there are already multiple threads active on process creation and issue a warning if so. This warning could of course be optional but would make this issue more obvious.
In my case we have a large C++ code base which still includes a lot of Fortran 77 code with common blocks all over the place (yay science). Everything is interfaced in python so to make sure that I do not have any side effects I run the some of the functions in a fork using multiprocessing.Process(). And in this case I just wanted to run some testing in parallel. I now switched to a ProcessPoolExecutor which works fine but for me.
|
msg269787 - (view) |
Author: Davin Potts (davin) * |
Date: 2016-07-04 15:19 |
While I believe I understand the motivation behind the suggestion to detect when the code is doing something potentially dangerous, I'll point out a few things:
* any time you ask for a layer of convenience, you must choose something to sacrifice to get it (usually performance is sacrificed) and this sacrifice will affect all code (including non-problematic code)
* behind the scenes multiprocessing itself is employing multiple threads in the creation and coordination between processes -- "checking to see if there are multiple threads active on process creation" is therefore a more complicated request than it maybe first appears
* Regarding "python makes it very easy to mix these two", I'd say it's nearly as easy to mix the two in C code -- the common pattern across different languages is to learn the pros+cons+gotchyas of working with processes and threads
I too come from the world of scientific software and the mixing of Fortran, C/C++, and Python (yay science and yay Fortran) so I'll make another point (apologies if you already knew this):
There's a lot of computationally intensive code in scientific code/applications and being able to perform those computations in parallel is a wonderful thing. I am unsure if the tests you're trying to speed up exercise compute-intensive functions but let's assume they do. For reasons not described here, using the CPython implementation, there is a constraint on the use of threads that restricts them to all run on a single core of your multi-core cpu (and on only one cpu if you have an SMP system). Hence spinning up threads to perform compute intensive tasks will likely result in no better throughput (no speedup) because they're all fighting over the same maxed-out core. To spread out onto and take advantage of multiple cores (and multiple cpus on an SMP system) you will want switch to creating processes (as you say you now have). I'd make the distinction that you are likely much more interested in 'parallel computing' than 'concurrent execution'. Since you're already using multiprocessing you might also simply use `multiprocessing.Pool`.
|
msg269794 - (view) |
Author: Martin Ritter (Martin Ritter) |
Date: 2016-07-04 16:37 |
Dear Davin,
Thanks for the input, I was perfectly aware that the "solution" I proposed is not realistic. But the feedback that multiprocessing is using threads internally is useful as I can quickly abandon the idea to do something like the check I proposed in our code base without spending time on it.
I was aware of the Gil, I just did not anticipate that big a problem when mixing threads and processes with rather simple python code. My bad, sorry for the noise.
Cheers,
Martin
|
msg269798 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2016-07-04 16:55 |
To clarify the GIL issue (for davin, I guess? :): if the library you are using to interface with the FORTRAN code drops the GIL before calling the FORTRAN, then you *can* take advantage of multiple cores. It is only the python code (and some of the code interacting with the python objects) that is limited to executing on one core at a time. (As far as I know it isn't restricted to be the *same* core unless you set CPU affinity somehow, and I have no idea if it improves performance to use CPU affinity or not).
|
msg269803 - (view) |
Author: Davin Potts (davin) * |
Date: 2016-07-04 19:29 |
@r.david.murray: Oh man, I was not going to go as far as advocate dropping the GIL. :)
At least not in situations like this where the exploitable parallelism is meant to be at the Python level and not inside the Fortran code (or that was my understanding of the setup). Martin had already mentioned the motivation to fork to avoid side effects possibly arising somewhere in that code.
In practice, after dropping the GIL the threads will likely use multiple of the cores -- though that's up to the OS kernel scheduler, that's what I've observed happening after temporarily dropping the GIL on both Windows and Linux systems.
As to the benefit of CPU affinity, it depends -- it depends upon what my code was and what the OS and other system processes were busily doing at the time my code ran -- but I've never seen it hurt performance (even if the help was diminishingly small at times). For certain situations, it has been worth doing.
Correction: I have seen cpu affinity hurt performance when I make a bone-headed mistake and constrain too many things onto too few cores. But that's a PEBCAK root cause.
|
msg269817 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2016-07-05 11:25 |
Heh, yeah. What I was really trying to do by that comment was clarify for any *other* readers that stumble on this issue later it is just the python code that *has* to be constrained by the GIL. I have no idea how much of the scipy stack drops the gil at strategic spots. I do seem to remember that the the Jupyter uses multiple processes for its parallelism, though. Anyway, this is pretty off topic now :)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:33 | admin | set | github: 71609 |
2016-07-05 11:25:28 | r.david.murray | set | messages:
+ msg269817 |
2016-07-04 19:32:46 | davin | set | status: open -> closed stage: resolved |
2016-07-04 19:29:36 | davin | set | messages:
+ msg269803 |
2016-07-04 16:55:21 | r.david.murray | set | messages:
+ msg269798 |
2016-07-04 16:37:13 | Martin Ritter | set | messages:
+ msg269794 |
2016-07-04 15:19:42 | davin | set | messages:
+ msg269787 |
2016-07-04 13:18:08 | Martin Ritter | set | messages:
+ msg269785 |
2016-07-02 20:18:43 | rhettinger | set | nosy:
+ docs@python messages:
+ msg269734
assignee: docs@python components:
+ Documentation resolution: not a bug |
2016-07-02 18:54:05 | davin | set | type: crash -> behavior
messages:
+ msg269727 nosy:
+ davin |
2016-06-30 18:25:20 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg269613
|
2016-06-30 17:11:16 | r.david.murray | set | nosy:
+ r.david.murray, devin, sbt messages:
+ msg269603
|
2016-06-30 16:25:04 | Martin Ritter | set | files:
+ test_threadfork_backtrace.txt
messages:
+ msg269594 |
2016-06-30 16:23:23 | Martin Ritter | create | |