New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make faulthandler dump traceback of child processes #56622
Comments
As noted in issue bpo-11870, making faulthandler capable of dumping child processes' tracebacks could be a great aid in debugging tricky deadlocks involving for example multiprocessing and subprocess.
I'm not sure how this would work out on Windows, but I don't event know if Windows has a notion of child processes or process groups... |
Oh oh. I already thaugh to this feature, but its implementation is not trivial.
You mean that the tracebacks of children should be dumped on a timeout of the parent? Or do you also want them on a segfault of the parent? In my experience, the most common problem with the multiprocessing and subprocess modules is the hang. The timeout is implemeted using a (C) thread in faulthandler. You can do more in a thread than in a signal handler ;-) A hook may be added to faulthandler to execute code specific to multiprocessing / subprocess.
In which case is Python the leader of the group? Is it the case by default? Can we do something to ensure that in regrtest, in multiprocessing tests or the multiprocessing module? See also bpo-5115 (for the subprocess module). The subprocess maintains a list of the create subprocesses: subprocess.alive, but you need a reference a reference to this list (or you can access it using the Python namespace, but it requires the GIL and you cannot trust the GIL on a crash). subprocess can execute any program, not only Python. Send an arbitrary signal to a child process can cause issues. Does multiprocessing maintain a list of child processes? -- By the way, which signal do you want to send to the child processes? A test may replace the signal handler of your signal (most test use SIGALRM and SIGUSR1). faulthandler.register() is not available on Windows. -- crier ( https://gist.github.com/737056 ) is a tool similar to faulthandler, but it is implemented in Python and so is less reliable. It uses a different trigger: it checks if a file (e.g. /tmp/crier-<pid>) does exists. A file (e.g. a pipe) can be used with a thread watching the file to send the "please dump your traceback" request to the child processes. |
If we have the pid list of the children, we can use an arbitrary sleep (e.g. 1 second) before sending a signal to the next pid. Anyway, a sleep is the most reliable synchronization code after a crash/timeout. |
Well, a segfault is due to the current process (or sometimes to
Yes, but when the timeout expires, there's no guarantee about the
Yes, it's the case by default when you launch a process through a shell.
Yes, we don't have any guarantee about the interpreter's state, and
Well, faulthandler is disabled by default, no ?
Hum, SIGTERM maybe? Don't you register some fatal signals by default? |
subprocess doesn't use a shell by default, and I don't think that
I don't think that we can have a reliable, generic and portable solution I agree that interpreter state can be inconsistent, but faulthandler To simplify the implementation, I propose to patch multiprocessing It would be better if these modules unregister pid when a subprocess
Yes, but I prefer to interfer with unrelated processes if it's possible.
faulthandler.enable() installs a signal handler for SIGSEGV, SIGBUS,
Well, it's doesn't really matter. If one child process doesn't print the |
No, but we precisely want subprocess/multiprocessing-created processes
It'll be intrusive and error-prone: for example, you'll have to reset
Well, those processes are started by subprocess, and this would be
We could use one of these signals. |
There is not activity for 10 years. I consider that this feature is not really needed. I reject this feature request. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: