classification
Title: signals not always delivered to main thread, since other threads have the signal unmasked
Type: behavior Stage: patch review
Components: Documentation, Interpreter Core Versions: Python 3.2, Python 3.1, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: Rhamphoryncus, bamby, exarkun, georg.brandl, gvanrossum, laca, movement, mstepnicki, pitrou, ross (10)
Priority: normal Keywords patch

Created on 2008-01-30 16:38 by bamby, last changed 2009-12-15 16:59 by pitrou.

Files
File name Uploaded Description Edit Remove
pthread_sig.diff bamby, 2008-01-30 16:38
Messages (34)
msg61870 - (view) Author: Andriy Pylypenko (bamby) Date: 2008-01-30 16:38
Hello,

This issue is actually a follow up of issue 960406. The patch applied
there brought in a problem at least under FreeBSD. The problem is
extremely annoying so I made an effort to uncover the source of it.

Here is an example code that shows the problem:

    some_time = 6000000 # seconds

    class MyThread(Thread):
        def run(self):
            while (True):
                time.sleep(some_time)

    t = MyThread()
    t.start()
    while(True):
        select.select(None, None, None, some_time)

Start this script, then try to interrupt it by hitting Control-C. If you
run this code under Linux you won't see any problem. But under FreeBSD
the script won't stop until the timeout in main thread occurs or some
activity takes place on descriptors passed to the select().

My investigation showed that the source of the problem is in that how
signals are processed. FreeBSD processes signals in opposite order than
Linux. For example suppose we have a program that starts one user thread
and allows both main and user threads to receive signals. Under Linux
the signal handler always fires up in context of the main thread, but
under FreeBSD the signal handler runs in context of the user thread. 
POSIX doesn't state which behavior is correct so both behaviors should
be assumed to be correct and Python should be aware of them both. Before
the patch from 960406 the Python made effort to deny signal handling in
user threads but the patch dropped this code and all threads are allowed
to handle signals.

Let's return to the script. When running the script under Linux the
select() call is the one that gets interrupted by the signal and this
allows the script to shutdown quickly. But under FreeBSD the sleep()
call is interrupted by the signal leaving the main thread to wait on
select() until timeout.

The description of issue 960406 states:

"This is a patch which will correct the issues some people have with
python's handling of signal handling in threads. It allows any thread to
initially catch the signal mark it as triggered, allowing the main
thread to later process it."

And yes it behaves exactly as described but this behavior is
inconsistent between different OSes.

To make things predictable I've restored the code that ensures that
signal handler will run in context of main thread only:

long
PyThread_start_new_thread(void (*func)(void *), void *arg)
{
    ...
+   sigset_t set, oset;
    ...
+   sigfillset(&set);
+   SET_THREAD_SIGMASK(SIG_BLOCK, &set, &oset);
    pthread_create(...)
+   SET_THREAD_SIGMASK(SIG_SETMASK, &oset, NULL);
    ...

and this works perfectly for me under FreeBSD and Linux at least for my
needs. It doesn't bring any visible changes to readline behavior either.
I'm using the 2.5.1 version of Python. In attach you can find this patch
against the trunk.

I'm not Python guru but let me try to display my vision of the situation.

As I understand, my change does nothing for applications written in pure
Python and running under Linux (without user modules written in C and
using special thread and signal handling). Signals under Linux have
absolutely no chance to be caught from within user threads as Python
doesn't provide any way to alter the signal mask and with the default
signal mask the signals always arrive to the main thread. So explicit
prohibition to handle signals from within user thread doesn't change
anything. On the other hand this change ensures that under FreeBSD
things go exactly like under Linux.

Of course this change can possibly break some C-written module that
relies on signal handling in context of user thread (as the signal mask
of the user thread must be modified explicitly now). But anyway this is
how things are meant to work in order to be portable. So I'm considering
this possibility as highly unlikely.

I suppose the Mac OS X is affected also as it's based on FreeBSD.
msg61960 - (view) Author: Guido van Rossum (gvanrossum) Date: 2008-02-01 15:58
Actually I see the same behavior under Linux and OSX: the first ^C
interrupts the select() call, after that ^C is ignored.
msg62038 - (view) Author: Andriy Pylypenko (bamby) Date: 2008-02-04 09:54
I'm sorry I've forgotten to add one important thing to the script - the
t.setDaemon(True) call, as without it the main thread will wait for the
user thread to stop explicitly. So the correct script is:

    some_time = 6000000 # seconds

    class MyThread(Thread):
        def run(self):
            while (True):
                time.sleep(some_time)

    t = MyThread()
    t.setDaemon(True)
    t.start()
    while(True):
        select.select(None, None, None, some_time)
msg62044 - (view) Author: Guido van Rossum (gvanrossum) Date: 2008-02-04 16:43
Well okay than I can confirm that OSX is *not* affected by this OS
bugginess.
msg79780 - (view) Author: John Levon (movement) Date: 2009-01-13 22:29
This issue also affects Solaris (and in particular xend is broken). Is
there a reason bamby's fix isn't yet applied?
msg79783 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-01-13 22:38
Is there any reason not to simply catch KeyboardInterrupt in the user
thread, and then notify the main thread?
msg79801 - (view) Author: John Levon (movement) Date: 2009-01-13 23:37
Yes, Python guarantees the behaviour under discussion:

http://docs.python.org/library/signal.html
msg82868 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2009-02-27 20:33
The readline API just sucks.  It's not at all designed to be used
simultaneously from multiple threads, so we shouldn't even try.  Ban
using it in non-main threads, restore the blocking of signals, and go on
with our merry lives.
msg82878 - (view) Author: Guido van Rossum (gvanrossum) Date: 2009-02-27 22:03
Agreed.  Multiple threads trying to read interactive input from a
keyboard sounds like a bad idea anyway.
msg82914 - (view) Author: John Levon (movement) Date: 2009-02-28 15:06
Surely readline is irrelevant anyway. The Python spec guarantees
behaviour, and that guarantee is currently broken.
msg82918 - (view) Author: Guido van Rossum (gvanrossum) Date: 2009-02-28 16:05
Hm, I'm not sure why Adam brought up readline.  The behavior is
certainly guaranteed (I put that guarantee in myself long ago :-) and it
should be fixed.  I have no opinion about the proposed patch, since I
cannot test this and have long lost sufficient understanding of this
part of CPython to understand all the ramifications, sorry.
msg83107 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2009-03-03 21:57
issue 960406 broke this as part of a fix for readline.  I believe that
was motivated by fixing ctrl-C in the main thread, but non-main threads
were thrown in as a "why not" measure.

msg 46078 is the mention of this.  You can go into readlingsigs7.patch
and search for SET_THREAD_SIGMASK.
msg91070 - (view) Author: John Levon (movement) Date: 2009-07-29 21:48
Any progress on this regression? A patch is available... thanks.
msg92328 - (view) Author: Marcin Stepnicki (mstepnicki) Date: 2009-09-06 20:08
I have just got bitten by this bug - I usually run my software under
Linux/Windows, this was the first time that my customer requested
specifically FreeBSD platform and I was *really* surprised. Not to
mention the fact that bug in Python came as the last thing to my mind -
I was blaming my code of course :-).

Anyway, I can confirm the patch works for me and I'd like to see it
included in future versions. Can I do something to make it happen? 

Regards,
Marcin
msg96377 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 15:03
I'm not sure there's any real issue here. The signal *does* get
propagated to the main thread, it only takes some time to do so. If you
want the program to be interruptible more quickly, just lower the
timeout you give to select().
msg96379 - (view) Author: Jean-Paul Calderone (exarkun) Date: 2009-12-14 15:08
I don't like the suggestion to lower the timeout to select().  Lots of
the rest of the world is pushing towards removing this kind of periodic
polling (generally with the goal of improving power consumption). 
Python should be going this way too (the recent introduction of
signal.set_wakeup_fd suggests that at least some Python developers are
convinced of this).
msg96380 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 15:11
> I don't like the suggestion to lower the timeout to select().  Lots of
> the rest of the world is pushing towards removing this kind of periodic
> polling (generally with the goal of improving power consumption). 

Yes, I'm aware of this. I was only suggesting the easiest solution to
the problem at hand :-)
msg96386 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2009-12-14 18:07
The real, OS signal does not get propagated to the main thread.  Only
the python-level signal handler runs from the main thread.

Correctly written programs are supposed to let select block
indefinitely.  This allows them to have exactly 0 CPU usage, especially
important on laptops and other limited power devices.
msg96388 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 18:17
> The real, OS signal does not get propagated to the main thread.  Only
> the python-level signal handler runs from the main thread.

Well, the signals /are/ delivered as far as Python code is concerned. I
don't think Python makes any promise as to the delivery of signals at
the C level.
(actually, the best promise we may make is not to block signal delivery
at all, so that third-party libs or extensions relying on threaded
signal delivery don't break)
msg96390 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2009-12-14 18:34
You forget that the original report is about ctrl-C.  Should we abandon
support of it for threaded programs?  Close as won't-fix?

We could also just block SIGINT, but why?  That means we don't support
python signal handlers in threaded programs (signals sent to the
process, not ones sent direct to threads), and IMO threads expecting a
specific signal should explicitly unblock it anyway.
msg96391 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 18:48
> You forget that the original report is about ctrl-C.  Should we abandon
> support of it for threaded programs?

We haven't abandoned support, have we? Where is the spec that is
currently broken?

Besides, as Jean-Paul pointed out, the user can now setup a file
descriptor on which a byte will be written out as soon as a signal gets
caught.

> Close as won't-fix?

It is one possibility indeed.
We could also add an API (or an optional argument to the existing APIs)
to block signals in threads created by Python.
msg96393 - (view) Author: John Levon (movement) Date: 2009-12-14 18:52
The spec broken is here:

http://docs.python.org/library/signal.html

Namely:

# Some care must be taken if both signals and threads are used in the
same program. The fundamental thing to remember in using signals and
threads simultaneously is: always perform signal() operations in the
main thread of execution. Any thread can perform an alarm(),
getsignal(), pause(), setitimer() or getitimer(); only the main thread
can set a new signal handler, and the main thread will be the only one
to receive signals (this is enforced by the Python signal module, even
if the underlying thread implementation supports sending signals to
individual threads). This means that signals can’t be used as a means of
inter-thread communication. Use locks instead.
msg96395 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 19:04
> The spec broken is here:
> 
> http://docs.python.org/library/signal.html

I would argue it is not broken. This documentation page is about a
module of the standard library, it doesn't specify the underlying C
implementation. That "the main thread will be the only one to receive
signals" is true if you consider it from the Python code's point of
view: signal handlers are always called in the main thread, even if the
OS-level signal was delivered to (and caught by) another thread.

I don't have any strong view over whether the interpreter should,
theoretically, block signals in non-main threads. But, practically,
blocking signals apparently produced issues with readline (and possibly
other libs relying on signals), which is why they are not blocked today.
msg96396 - (view) Author: Marcin Stepnicki (mstepnicki) Date: 2009-12-14 19:10
> I don't have any strong view over whether the interpreter should,
> theoretically, block signals in non-main threads. But, practically,
> blocking signals apparently produced issues with readline (and possibly
> other libs relying on signals), which is why they are not blocked today.

I see your point of view, but the problem is that current behaviour is
inconsistent between different operating system. As there are many
people who brought up this issue, I think it should be at least
documented somewhere.

Regards,
Marcin
msg96397 - (view) Author: Jean-Paul Calderone (exarkun) Date: 2009-12-14 19:14
> > http://docs.python.org/library/signal.html

> I would argue it is not broken.

If it's not broken, then the docs are at least confusing.  They should
make clear whether they are talking about the underlying signal or the
Python signal handler.  This makes a difference for many applications
which deal with signals.  I would even say that due to the very tricky
nature of signals, the documentation *should* be discussing the way it
is implemented.  Without that information, it's very difficult to handle
some situations correctly.  This wouldn't necessarily mean that the
implementation would have to stay the same, either - just that the
implementation be documented for each version (of course, keeping it the
same would be preferable, for all the normal reasons).
msg96406 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-14 21:51
As I said, a flexible solution would be for thread creation functions to
take an optional argument specifying whether to block signals or not.
(I don't mind the default value of the argument :-))
msg96415 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2009-12-14 22:20
A better solution would be to block all signals by default, then unblock
specific ones you expect.  This avoids races (as undeliverable signals
are simply deferred.)

Note that readline is not threadsafe anyway, so it doesn't necessarily
need to allow calls from the non-main thread.  Maybe somebody is using
that way, dunno.
msg96428 - (view) Author: Andriy Pylypenko (bamby) Date: 2009-12-15 07:42
Let me add my 2 cents. I understood the considerations about differences 
between Python code level interrupt handling and OS level interrupts. 

What I cannot get is why to preserve the handling of signals in the user 
threads on OSes like FreeBSD and Solaris. This functionality isn't used 
on Linux and Windows at all, as the interrupts on them are always 
delivered to the main thread. The patch simply assures the same behavior 
on the FreeBSD and Solaris, so why to keep things unpredictable when 
there is a way to solve the problem? Can anyone state what exactly 
purpose of not to make OS signal handling in Python predictable?

This bug report was created mainly because there is no easy Python code 
solution for this problem. The Python documentation clearly states that 
there is no user accessible Python functions that can modify per-thread 
signal mask, so it is currently impossible to solve the problem with 
just Python code. Modification of timeouts isn't vital solution in far 
too many real life situations.

BTW this patch is officially in the FreeBSD ports tree since Feb 27 2009 
and there is no complains on this patch since then.
msg96429 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-15 10:54
Well, the history on this looks a bit complicated and I don't really
know the details, but witness the first sentences of the initial message
in issue960406:

“This is a patch which will correct the issues some people 
have with python's handling of signal handling in threads. It 
allows any thread to initially catch the signal mark it as 
triggered, allowing the main thread to later process it. (This 
is actually just restoring access to the functionality that was 
in Python 2.1)”

Apparently Python has been hesitating between both behaviours.

> The Python documentation clearly states that 
> there is no user accessible Python functions that can modify
> per-thread signal mask, so it is currently impossible to solve the
> problem with just Python code.

Well as I already said we could introduce this missing feature. Ideas
and patches welcome.
msg96433 - (view) Author: Andriy Pylypenko (bamby) Date: 2009-12-15 11:34
> Well as I already said we could introduce this missing feature. Ideas
> and patches welcome.

Well, this would be definitely a working solution.
msg96444 - (view) Author: John Levon (movement) Date: 2009-12-15 16:35
I still do not understand the objection you have to the simple patch
which restores old behaviour, works the same across all OSes, and
doesn't require new APIs. What is the objection?
msg96445 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-15 16:39
The objection is that the "old behaviour" was changed to solve another
problem. We don't gain anything by switching back and forth between two
different behaviours.
msg96446 - (view) Author: John Levon (movement) Date: 2009-12-15 16:49
To quote Andriy in the first comment:

"It doesn't bring any visible changes to readline behavior either."

Are you saying this is not the case?
msg96447 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-12-15 16:59
I'm just saying that I don't know, and I don't think an observation from
one user is enough since these issues are notoriously platform-specific.

If you want to revert the change made in issue960406, you should IMO
demonstrate that somehow this change wasn't needed.
But I don't know how this would be better than a more flexible API,
except of course that the patch for that API doesn't exist (but wouldn't
be difficult to produce by someone motivated).
History
Date User Action Args
2009-12-15 16:59:37pitrousetmessages: + msg96447
2009-12-15 16:49:36movementsetmessages: + msg96446
2009-12-15 16:39:42pitrousetmessages: + msg96445
2009-12-15 16:35:58movementsetmessages: + msg96444
2009-12-15 11:34:16bambysetmessages: + msg96433
2009-12-15 10:54:23pitrousetmessages: + msg96429
2009-12-15 07:42:17bambysetmessages: + msg96428
2009-12-14 22:20:38Rhamphoryncussetmessages: + msg96415
2009-12-14 21:51:54pitrousetmessages: + msg96406
2009-12-14 19:14:14exarkunsetmessages: + msg96397
2009-12-14 19:12:23pitrousetassignee: georg.brandl

nosy: + georg.brandl
components: + Documentation
versions: + Python 3.2, - Python 2.5, Python 2.4, Python 3.0
2009-12-14 19:10:53mstepnickisetmessages: + msg96396
2009-12-14 19:04:51pitrousetmessages: + msg96395
2009-12-14 18:52:18movementsetmessages: + msg96393
2009-12-14 18:48:18pitrousetmessages: + msg96391
2009-12-14 18:34:58Rhamphoryncussetmessages: + msg96390
2009-12-14 18:17:24pitrousetmessages: + msg96388
2009-12-14 18:07:48Rhamphoryncussetmessages: + msg96386
2009-12-14 15:11:17pitrousetmessages: + msg96380
2009-12-14 15:08:36exarkunsetnosy: + exarkun
messages: + msg96379
2009-12-14 15:03:33pitrousetmessages: + msg96377
2009-12-14 09:41:04lacasetnosy: + laca
2009-09-06 20:08:37mstepnickisetnosy: + mstepnicki
messages: + msg92328
2009-07-29 21:48:59movementsetmessages: + msg91070
2009-03-03 21:57:51Rhamphoryncussetmessages: + msg83107
2009-02-28 16:05:08gvanrossumsetmessages: + msg82918
stage: patch review
2009-02-28 15:06:52movementsetmessages: + msg82914
2009-02-28 00:56:16rosssetnosy: + ross
2009-02-27 22:03:33gvanrossumsetmessages: + msg82878
2009-02-27 20:34:03Rhamphoryncussetversions: + Python 2.6, Python 3.0, Python 3.1, Python 2.7
2009-02-27 20:33:38Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg82868
2009-01-13 23:37:56movementsetmessages: + msg79801
2009-01-13 22:38:20pitrousetnosy: + pitrou
messages: + msg79783
2009-01-13 22:29:56movementsetnosy: + movement
messages: + msg79780
title: signals in thread problem -> signals not always delivered to main thread, since other threads have the signal unmasked
2008-03-18 17:29:48jafosetpriority: normal
2008-02-04 16:43:34gvanrossumsetkeywords: + patch
messages: + msg62044
2008-02-04 09:54:23bambysetmessages: + msg62038
2008-02-01 15:58:20gvanrossumsetnosy: + gvanrossum
messages: + msg61960
2008-01-30 16:38:05bambycreate