classification
Title: Please provide a way to disable the warning printed if the signal module's wakeup fd overflows
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: njs, pitrou, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2017-04-12 08:54 by njs, last changed 2017-12-18 04:13 by yselivanov. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4792 merged njs, 2017-12-11 07:49
Messages (13)
msg291532 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-04-12 08:54
When a wakeup fd is registered via signal.set_wakeup_fd, then the C level signal handler writes a byte to the wakeup fd on each signal received. If this write fails, then it prints an error message to the console.

Some projects use the wakeup fd as a way to see which signals have occurred (asyncio, curio). Others use it only as a toggle for "a wake up is needed", and transmit the actual data out-of-line (twisted, tornado, trio – I guess it has something to do with the letter "t").

One way that writing to the wakeup fd can fail is if the pipe or socket's buffer is already full, in which case we get EWOULDBLOCK or WSAEWOULDBLOCK. For asyncio/curio, this is a problem: it indicates a lost signal! Printing to the console isn't a great solution, but it's better than letting the error pass silently.

For twisted/tornado/trio, this is a normal and expected thing – the semantics we want are that after a signal is received then the fd will be readable, and if its buffer is full then it's certainly readable! So for them, EWOULDBLOCK/WSAEWOULDBLOCK are *success* conditions. Yet currently, the signal module insists on printing a scary message to the console whenever we succeed in this way.

It would be nice if there were a way to disable this; perhaps something like: signal.set_wakeup_fd(fd, warn_on_full_buffer=False)

This is particularly annoying for trio, because I try to minimize the size of the wakeup fd's send buffer to avoid wasting non-swappable kernel memory on what's essentially an overgrown bool. This ends up meaning that on Linux the buffer is 6 bytes, and on MacOS it's 1 byte. So currently I don't use the wakeup fd on Linux/MacOS, which is *mostly*  OK but it would be better if we could use it. Trio bug with a few more details: https://github.com/python-trio/trio/issues/109
msg291533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-12 09:12
> One way that writing to the wakeup fd can fail is if the pipe or socket's buffer is already full, in which case we get EWOULDBLOCK or WSAEWOULDBLOCK.

Is it a theorical question or did you notice the error in the wild? I mean: without sending a signal in a loop.

Right now, we write the signal number into the pipe each time a signal is received. Do you need to get N bytes if the signal is received N times? Maybe we can use a flag and only write the byte once by signal number?


> Printing to the console isn't a great solution, but it's better than letting the error pass silently.

If the pipe becomes full, you loose signals: it's an hard error. I decided to log an error into fd 2 because in a signal handler, the number of allowed functions is very limited.

I suggest to have a flag to remind if the pipe overflowed, and in that case: schedule a Python callback. The callback can be a new parameter to set_wakeup_fd()?

A pipe seems limited to 64 KB, limit hardcoded in Linux kernel code. Is it the case if we use a socket pair, as we already do on Windows? Or is it possible to increase the socket buffer size?


> This is particularly annoying for trio, because I try to minimize the size of the wakeup fd's send buffer to avoid wasting non-swappable kernel memory on what's essentially an overgrown bool.

asyncio uses wakeup fd to get signal numbers: it's not only a bool. See the issue #21645 for the rationale of using the FD to get signal numbers in asyncio.

It was a bool on Python 2 which only writes NUL bytes into the FD.


Related question: does asyncio support signals on Windows? I added support for set_wakeup_fd() to Windows for asyncio, but then I forgot this TODO item :-)
msg291536 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-04-12 09:28
I haven't noticed the error in the wild because I don't use set_wakeup_fd on Linux/MacOS, because of this issue :-). But on MacOS literally all it would take is to receive two signals in quick succession, or to receive one signal at a moment when someone has recently scheduled a callback from a thread (which also writes to the wakeup fd). In principle it can also happen on Windows, but on Windows apparently the smallest buffer you get for a socketpair is like 500 kilobytes, so it doesn't come up except in rather artificial situations.

And yes, a possible workaround is to use an artificially large buffer on Linux/MacOS as well. I'm not saying this is a huge showstopper. But it's silly for downstream libraries to carry around workarounds for issues in CPython when we could just fix CPython.

> If the pipe becomes full, you loose signals: it's an hard error. I decided to log an error into fd 2 because in a signal handler, the number of allowed functions is very limited.

This makes sense for asyncio, but not for most other async libraries in Python, because they *don't* lose signals when the pipe becomes full. That's why I'm suggesting that the signal module should have an option to let users decide whether they want the warning or not.

(It might also make sense to switch asyncio to stop using the fd as a communications channel. Theoretically this would be the ideal solution, but I understand why it's not a priority.)

> Related question: does asyncio support signals on Windows? I added support for set_wakeup_fd() to Windows for asyncio, but then I forgot this TODO item :-)

No, asyncio doesn't use set_wakeup_fd on Windows, so code like 'loop.run_until_complete(asyncio.sleep(99999))' can't be interrupted. I'd appreciate if you could wait a few days to fix this though, because the blog post that I'm almost finished writing about signal handling currently uses this as an example of how trio is more awesome than asyncio ;-).

(This is a joke. Well, I mean, mostly. My draft does literally have that sleep example in it, but I won't be grumpy if you fix it first. Or at least, not *very* grumpy :-).)
msg295824 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-06-12 20:39
> It would be nice if there were a way to disable this; perhaps something like: signal.set_wakeup_fd(fd, warn_on_full_buffer=False)

That's a reasonable idea.  Nathaniel, would you like to submit a PR for this?
msg308064 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-11 18:18
I'm now well acquainted with with the signal and asyncio modules, but is this the best solution? Wouldn't be better to specify a custom handler? If this is possible.
msg308065 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-12-11 18:37
Serhiy: I don't know what "specify a custom handler" means in this context. Can you elaborate? The fd buffer overflow happens in a very delicate context where we definitely cannot call python code or even safely touch PyObject* variables.
msg308066 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-11 18:38
A custom handler would be overkill IMO.  set_wakeup_fd() is really useful for a small category of library (most notably event loop frameworks).
msg308067 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-11 18:54
I meant a callback that will be called after writing a byte to the wakeup fd is failed. By default it would write a warning to stderr, but the user could specify other callback for silencing a warning or for using other way for logging it. If this is impossible or overkill I have no other questions.
msg308069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-11 19:36
As I wrote, I never saw this warning, even while debugging signal issues in
asyncio on FreeBSD. I don't think that this is an use case for a callback.
Nathaniel use case is just to ignore the warning, I consider that it's a
reasonable compromise for the issue.

If someone wants a more advanced signal handler, it's still possible to use
register a custom signal handler at the C level.
msg308089 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-12-12 06:14
Yeah, I agree with Antoine and Victor that a callback would be overkill, and it would be extremely difficult to implement since at the point where the error occurs, we don't have and can't take the GIL.
msg308099 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-12 09:05
> it would be extremely difficult to implement since at the point where the error occurs, we don't have and can't take the GIL.

The signal handler uses Py_AddPendingCall() which calls a C function "later", but it can be anytime, between two Python instruction. If the callback fails, it can be annoying to get an error "anywhere". It can be even worse if the callback only fails if we are calling a specific function.

Handling signals is very tricky, it's hard to "get it right".

For example, Py_AddPendingCall() has an hardcoded queue of 32 callbacks. If the queue is full... the callback is lost...

Py_AddPendingCall() also tries to acquire a lock... from a signal handler... Probably the worst idea... But that's another idea :-)

    /* try a few times for the lock.  Since this mechanism is used
     * for signal handling (on the main thread), there is a (slim)
     * chance that a signal is delivered on the same thread while we
     * hold the lock during the Py_MakePendingCalls() function.
     * This avoids a deadlock in that case.
     * Note that signals can be delivered on any thread.  In particular,
     * on Windows, a SIGINT is delivered on a system-created worker
     * thread.
     * We also check for lock being NULL, in the unlikely case that
     * this function is called before any bytecode evaluation takes place.
     */
    if (lock != NULL) {
        for (i = 0; i<100; i++) {
            if (PyThread_acquire_lock(lock, NOWAIT_LOCK))
                break;
        }
        if (i == 100)
            return -1;
    }

We may also get a new signal while handling a signal...

Currently, the Windows implementation of the signal handler remembers if there is pending call to report_wakeup_send_error(), so it doesn't fill the pending call callback queue. But the Linux implementation doesn't.

With Nathaniel's PR, the queue is always filled. I'm not sure that it's ideal, but it's also hard to handle synchronization here so I didn't complain :-) Always calling Py_AddPendingCall() does work for the signal module, but it may prevent other functions to schedule a pending call later. In practice, in CPython, only the signal handler calls Py_AddPendingCall() :-)
msg308511 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 04:10
New changeset 902ab80b590e474bb2077b1fae8aac497b856d66 by Yury Selivanov (Nathaniel J. Smith) in branch 'master':
bpo-30050: Allow disabling full buffer warnings in signal.set_wakeup_fd (#4792)
https://github.com/python/cpython/commit/902ab80b590e474bb2077b1fae8aac497b856d66
msg308512 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 04:13
I think that what Nathaniel proposes is very reasonable.  The PR is LGTM and I've just merged it.  Closing the issue.
History
Date User Action Args
2017-12-18 04:13:58yselivanovsetstatus: open -> closed
type: enhancement
messages: + msg308512

components: + Interpreter Core
resolution: fixed
stage: patch review -> resolved
2017-12-18 04:10:20yselivanovsetnosy: + yselivanov
messages: + msg308511
2017-12-12 09:05:51vstinnersetmessages: + msg308099
2017-12-12 06:14:49njssetmessages: + msg308089
2017-12-11 19:36:53vstinnersetmessages: + msg308069
2017-12-11 18:54:08serhiy.storchakasetmessages: + msg308067
2017-12-11 18:38:18pitrousetmessages: + msg308066
2017-12-11 18:37:12njssetmessages: + msg308065
2017-12-11 18:18:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg308064
2017-12-11 07:49:38njssetkeywords: + patch
stage: patch review
pull_requests: + pull_request4691
2017-06-12 20:39:04pitrousetnosy: + pitrou
messages: + msg295824
2017-04-12 09:28:27njssetmessages: + msg291536
2017-04-12 09:12:54vstinnersetmessages: + msg291533
2017-04-12 08:54:33njscreate