This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author duncf
Recipients BreamoreBoy, Rhamphoryncus, bamby, duncf, exarkun, georg.brandl, gregory.p.smith, laca, movement, mstepnicki, nh2, pitrou, ross
Date 2011-12-20.06:13:13
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1324361595.69.0.251062830248.issue1975@psf.upfronthosting.co.za>
In-reply-to
Content
I've been digging into this quite a bit, and I've been able to dig up a little more info.

* In Python 2.1, the behavior was very similar to what we have now -- signals were not blocked. http://bugs.python.org/issue465673 was filed reporting issues with readline on Solaris. The issue was basically that readline used setjmp/longjmp to handle the SIGINT, and the exception handler was being executed by the wrong thread. The fix that was implemented (for Python 2.2b1) was to block threads except to the main thread. There was some discussion at the time about this being the "proper" way according to POSIX. (http://groups.google.com/group/comp.lang.python/browse_frm/thread/61da54186fbeebf9)

* Python 2.2 and 2.3 had the opposite of the current behavior (i.e. all signals were blocked in threads). Several bugs were reported. Most of the problems seemed to be about blocked synchronous signals (e.g. SIGSEGV) leading to bad things, and the unkillable subprocesses caused when you fork/exec with signals block.
  - http://bugs.python.org/issue756924
  - http://bugs.python.org/issue949332
  - http://mail.python.org/pipermail/python-dev/2003-December/041138.html

  * The patch to fix these bugs was submitted as http://bugs.python.org/issue960406. Unfortunately, it was not well described and so the links to the above issues were not clear. The discussion in the tracker for 960406 revolves mostly around readline, and for good reason -- reverting to the 2.1 behavior required a fix to readline so as to not regress. Unfortunately,  I believe the main impetus behind the patch was to fix the handling of synchronous signals and unkillable subprocesses. It was implemented in Python 2.4.

* Since Python 2.4, everything's been working fine on Linux, because Linux will send signals to the main thread, only. Unfortunately, the problem remains that the signals in FreeBSD are generally handled by the user thread instead. This causes two problems.

  1. On FreeBSD, we must assume that every blocking system call, in *every thread*, can be interrupted, and we need to catch EINTR.

  2. On FreeBSD, we cannot block indefinitely in the main thread and expect to handle signals. This means that indefinite selects are not possible if we want to handle signals, and, perhaps more perversely, signal.pause() cannot be reliably used in the main thread.

  * Current attempts to fix this in the FreeBSD ports revert to the pre-2.4 behavior of blocking all signals. This leads to the same unkillable subprocesses and (presumably) issues with synchronous signals.

  * Attempts to fix this properly in Python are stalled because we've rightly detected that we're just oscillating between two behaviors, both having issues, and nobody has proposed a suitable middle ground.


I think I've found a suitable solution, that should resolve all of the issues:

* Block all *asynchronous* signals in user threads. The synchronous threads, such as SIGSEGV should not be blocked. (This was actually the original fix proposed for http://bugs.python.org/issue949332)

* Unblock all signals after a fork() in a thread, since the thread is now the main thread. This will solve the unkillable subprocesses.

* Readline should not be impacted by this change. The readline functionality was replaced as part of the 2.4 patch to not install readline's signal handlers, unless you're using a really old version of readline, *and* the original readline problems were only present when signals were unblocked, but we're going to start blocking them.

* As bamby points out in his first post here, this is unlikely to change the behavior of much code. Anything portable should work, it will now just be more predictable. I suppose if you were developing for FreeBSD (specifically for a stock unmodified Python compiled from source, not the version distributed through ports), this change could subtly change the behavior of an application. For most FreeBSD developers (i.e. the ones using ports), this change should simply result in killable subprocesses.


I will put together a patch, though I would like to see some consensus around this approach before I spend too much (more) time on this.

Thanks.
History
Date User Action Args
2011-12-20 06:13:16duncfsetrecipients: + duncf, georg.brandl, gregory.p.smith, exarkun, Rhamphoryncus, pitrou, movement, ross, bamby, laca, mstepnicki, nh2, BreamoreBoy
2011-12-20 06:13:15duncfsetmessageid: <1324361595.69.0.251062830248.issue1975@psf.upfronthosting.co.za>
2011-12-20 06:13:15duncflinkissue1975 messages
2011-12-20 06:13:13duncfcreate