New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
signal.pause() and signal handlers don't react to SIGCHLD in non-main thread #66094
Comments
Reproducer attached. To describe the problem in words, one needs to do this to reproduce this issue:
This happens because:
This happens on all Python versions I've tried, using pthread threading. |
First of all, signals and threads usually don't play together. Most signal functions are *not* thread safe. The behaviour of signal functions are not well defined for threads. Example with pause: "pause() causes the calling process (or thread) to sleep until a signal is delivered that either terminates the process or causes the invocation of a signal-catching function." What does it mean "or thread"? Sometimes the function waits for a signal from any thread, something only from the caller thread? :-p I understood that pause() only waits for signals received in the caller thread, main thread in your case. Depending on the platform, a signal may be delivered to a different thread :-/ Especially when a signal is send to the process, ex: "kill -USR1 pid" command on UNIX. This issue is more a documentation issue: we should mention that pause() is limited to a thread. Python signal handlers are only called from the main thread, even if signals can be received from any thread. For your example, you can workaround pause() issue by calling "signal.pthread_sigmask(signal.SIG_BLOCK, [signal.SIGCHLD])" in your thread. |
I think that the problem here is that the documentation sets an expectation that's not met, at least from my POV. Running the reproducer, I was expecting the main thread to unpause and execute the signal handler (citing: "Python signal handlers are always executed in the main Python thread"). I understand that this may be insanely hard to fix in code, so documentation should explicitly point out that there are some (perhaps unfixable) problems with Python's approach to threading+signals. |
It turns out that this bug is more general than signal.pause, and has caused problems for a few different people: The basic problem is that CPython's signal handling strategy on Unix-likes assumes that if a signal is delivered to the process at a time when the main thread is blocked in a syscall, then that syscall will promptly exit with EINTR. So for example, time.sleep has some clever code on Windows to make it interruptible with control-C, but on Unix we assume that the kernel will break the sleep for us. The problem with this is that POSIX only guarantees that *some* thread will receive the signal -- not necessarily the main thread. In practice it seems like most implementations do deliver most signals to the main thread or CPython wouldn't have gotten away with this for as long as it has, but in particular it seems like Linux is happy to deliver SIGCHLD to random other threads. So the C-level signal handler runs in whatever thread, it sets the flag for the main thread to run the Python-level signal handler... and then the main thread sits there blocked in sleep() or select() or pause() or whatever, and never checks the flag. A simple solution would be to make sure signals are always delivered to the main thread, by adjusting the C-level signal handler to do something like: if (current_thread != main_thread) {
pthread_kill(main_thread, sig_num);
return;
} |
If the main thread waits on select() and uses a pipe with Asyncio requires to have an event loop running in the main thread to spawn |
I don't really have a specific use case personally -- for trio, I haven't found a way to make use of set_wakeup_fd because of various issues[1], but I'm also not planning to use SIGCHLD, so this isn't very urgent. In general set_wakeup_fd can be a workaround, but I wanted to note this upstream because it's a bug in Python's signal handler logic. Note that Python already goes to great lengths to make e.g. signal handlers run during time.sleep on Windows; they ought to just work on Unix too. -- [1] (Off-topic but in case you're curious: I register actual signal handlers anyway because I follow the philosophy that the wakeup pipe should only be used for wakeups rather than transmitting information, so as long as signal handlers work I can do my own wakeups, + for some reason set_wakeup_fd doesn't work for me on Windows (no idea why, can't reproduce in simplified tests, might be my fault, need to debug), + set_wakeup_fd's handling of buffer overflow is broken for my purposes.) |
2017-03-22 1:55 GMT+01:00 Nathaniel Smith <report@bugs.python.org>:
I modified signal.set_wakeup_fd() to accept a socket handle on |
@Haypo: It's a socketpair. It works fine when I set up a toy test case using set_wakeup_fd + select, and it works fine in my real code when I use CFFI cleverness to register a signal handler that manually writes a byte to my wakeup socket, but when I pass that exact same socket to set_wakeup_fd in my real code, it doesn't work. It's pretty mysterious, and I have no particular reason to think that the problem is in CPython as opposed to some stupid mistake I'm making. |
@Haypo: okay, looked over things over for a third time and this time I found my very silly error :-). So I'm now able to use set_wakeup_fd on Windows (python-trio/trio#108), but not on Unix (python-trio/trio#109). In any case, the issue here remains that one shouldn't have to use set_wakeup_fd for a signal to interrupt time.sleep etc. |
Nathaniel Smith added the comment:
CPython implements a C signal handler which sets a flag and then exits So "interrupting sleep" is only reliable if:
When you implement an event loop, raising an exception may be the best |
Attached signal_pause_doesnt_wake_up.py is a little bit complex and not sure that it's reliable. I wrote thread_signal.py which should have a more deterministic behaviour. Output on Linux: As expected, the Python signal handler is called in the main thread. time.sleep() in the thread *is* interrupted by SIGUSR1 (select() fails with EINTR), but PyErr_CheckSignals() does nothing since it's not the main thread. Then the sleep is automatically restarted (PEP-475). The code works as expected, but I understand that the behaviour is surprising. To be honest, I never understood the rationale behind "only execute signal handlers in the main thread". I was also surprised to no see a pthread_kill() in the C signal handler. I dislike the idea of transfering the signal to the main thread from another thread in the C signal handler using pthread_kill(). Most code behave very badly when getting a signal, so getting a signal twice is likely to double the pain. Moreover, pthread_sigmask() can block signals in the main thread for deliberate reasons. If we should change something, I simply suggest to remove the arbitrary limitation from the C signal handler. I don't know the consequences yet. |
Hum, maybe I found the root issue: the C signal handler calls Py_AddPendingCall() which uses a lock to protect a global static pendingcalls array (of 32 items). The function tries 100 times in a row to acquire to lock, or does nothing (returns -1) if it fails to acquire the lock. If we start to allow signals from any thread, this shared pendingcalls array can quickly become a source of race conditions like deadlocks or ignored callbacks. To avoid deadlocks, IMHO the best is to have a per-thread array which consumes 512 bytes (on 64-bit, 32 items made of 2 pointers). -- The _thread module has a strange _thread.interrupt_main() function. -- From the point of view of the Python signal handler, the current "if (PyThread_get_thread_ident() != main_thread) return 0;" code in the C signal handler is somehow an implicit pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG)) on all threads except of the main thread, whereas Unix gives a fine control on these masks with the pthread_sigmask() function. -- The Windows part is more tricky. A Windows Event object (created by CreateEvent() and retrieved by _PyOS_SigintEvent()) is used to interrupt a few blocking functions:
The event is set by the SIGINT signal handler set by Python. Extract of pysleep() comment: /* Allow sleep(0) to maintain win32 semantics, and as decreed
|
thread_signal.py: Oops, I forgot to remove the "raise Exception" line from sighandler() (I used it for a quick test.) |
I created a WIP patch #768 to experiment allowing signal handlers from any threads. I expect many race conditions since I only removed sanity checks without adding new code to handle parallelism. |
Letting Python-level signal handlers run in arbitrary threads is an interesting idea, but it's a non-trivial change to Python semantics that may well break some programs (previously it was guaranteed that signal handlers couldn't race with main thread code), and it doesn't fully fix the bug here (because it's common for Python programs to contain threads which don't run Python code, e.g. if they use zmq). I can imagine some potential benefits, but I feel like this needs some substantial rationale? OTOH the pthread_kill-based fix I suggested is like 3 lines and AFAICT fixes the problem exactly.
This doesn't make much sense to me... CPython's pretty good at handling EINTR these days, and, well, the only case that's different is when you have a signal handler that needs to be run and the main thread is blocked in a syscall, which is exactly the case that's currently broken?
Sure, and with pthread_kill() this would have the effect that the signal would be queued by the kernel and delivered after signal is unmasked. That seems like pretty sensible semantics to me; in fact it's exactly what you get in single-threaded code. Compare that to right now where AFAICT pthread_sigmask() is pretty useless in the presence of threads, because of the thing where if one thread has a signal blocked then the kernel will pick another thread to deliver it to.
That code isn't in the C signal handler, it's in the code that the interpreter runs occasionally to check if the C signal handler has been called, right? |
Are you using a pipe or a socket to set_wakeup_fd? Pipes have rather small buffers. In any case, since, as you mention, Tornado and Twisted use it, and they never complained, perhaps your example is too contrived or artificial. |
That said, the pthread_kill() solution deserves testing as well. |
When I removed myself from the nosy list I did mean to stop working on this bug, just that I am unlikely to be of any help. :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: