Issue 13697: python RLock implementation unsafe with signals

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57906

classification

Title:	python RLock implementation unsafe with signals
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Lothsahn, Mike.Meyer, alex, benjamin.peterson, cvrebert, jcea, maizy, meador.inge, neologix, pitrou, rbcollins, rpetrov, vstinner
Priority:	normal	Keywords:

Created on 2012-01-02 21:13 by rbcollins, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
hang.py	rbcollins, 2012-01-02 21:13	Demonstration script
hang2.py	vstinner, 2015-03-03 14:51

Messages (22)
msg150477 - (view)	Author: Robert Collins (rbcollins) *	Date: 2012-01-02 21:13
This affects the python implementation of RLock only. If a signal occurs during RLock.acquire() or release() and then operates on the same lock to acquire() or release() it, process hangs or assertions can be triggered. The attached test script demonstrates the issue on Python 2.6 and 3.2, and code inspection suggests this is still valid for 2.7 and 3.4. To use it, run the script and then 'kill -SIGUSR2' the process id it prints out. Keep sending SIGUSR2 until you get bored or: - a traceback occurs - the process stops printing out '.'s We found this debugging application server hangs during log rotation (https://bugs.launchpad.net/launchpad/+bug/861742) where if the thread that the signal is received in was about to log a message the deadlock case would crop up from time to time. Of course, this wasn't ever safe code, and we're changing it (to have the signal handler merely set a integer flag that the logging handler can consult without locking). However, I wanted to raise this upstream, and I note per issue #3618 that the 3.x IO module apparently includes a minimal version of the Python RLock implementation (or just uses RLock itself). Either way this bug probably exists for applications simply doing IO, either when there is no C RLock implementation available, or all the time (depending on exactly what the IO module is doing nowadays).
msg150483 - (view)	Author: Jesús Cea Avión (jcea) *	Date: 2012-01-02 23:10
Adding myself to the nosy list because I am interested in the issue, but I can't take care of this myself for a few weeks, at least. Robert, would you consider to try to write a patch?
msg150488 - (view)	Author: Robert Collins (rbcollins) *	Date: 2012-01-03 00:52
I'm not sure it is sensibly implementable in pure python: the semantics of signal handling (AIUI) are that the vm is interrupted, sets a flag to say 'when the GIL is released or the next bytecode interpretation happens, please process signal X' [the former for C extensions, the latter for regular code], and then the OS level signal handler returns. The current C or bytecode execution completes and then we dispatch to the python signal handler. Now, what we would want here is that attempts to acquire/release the RLock in a signal handler would behave as though the RLock.acquire/release methods were atomic. The general way to enforce such atomicity is via a lock or mutex. Now, because the call stack looks like this: <signal>thread.acquire() <non-signal-code>thread.acquire() The other acquire, if it already holds this supposed mutex, would block the python signal handler acquire call from obtaining it; The inner acquire would have to error whether or not a timeout was passed, because the non-signal-code won't progress and complete until the python signal handler code returns. One could, in principle, use stack inspection to determine how far through a nested acquire/release the outer code is, but that seems, uhm, pathological to me :). The crux of the problem is detecting whether the non-reentrant thread.lock is held because (a) another thread holds it or (b) this thread holds it. If we can inspect its internals, we could determine that this thread holds it - and thus we must be reentered into the acquire/release methods. With that tricky thing established, we could look at the remaining logic to make sure that each line is either atomic or conditional on reentrancy. Simpler I think to require CRLocks always, and provide a portable C implementation which will always complete before returning to execution in the calling thread, avoiding this issue entirely.
msg150499 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-01-03 14:40
Yes, using synchronization primitives or doing I/O in Python signal handlers isn't a good idea. Perhaps the signal docs should be clearer about that. > Of course, this wasn't ever safe code, and we're changing it (to have the signal handler merely set a integer flag that the logging handler can consult without locking) Indeed, setting a variable (or using set_wakeup_fd()) is the right approach.
msg150517 - (view)	Author: Charles-François Natali (neologix) *	Date: 2012-01-03 17:20
The core of the problem is that we don't just want those methods to be atomic or thread-safe, but reentrant (or rather async-safe). As such, protecting by a lock isn't enough (and it's not really feasible in Python). Note that the RLock implementation already checks whether the lock is already acquire by the current thread, but there's a "race": if self._owner == me: [...] self._count = self._count + 1 return 1 rc = self._block.acquire(blocking, timeout) if rc: self._owner = me If the signal handler is called after _block has been acquired, but before self._owner is set, the next call to acquire from the signal handler won't "realize" that the block has already been acquired by the current thread, and will deadlock. Now, the fun part: this affects not only RLock, but every Python code performing "atomic" actions: condition variables, barriers, etc. There are some constraints on what can be done from a signal handler, and it should probably be documented. Note that another solution would be to use a dedicated thread for signal management (like Java does), but that's another story. Also, this shouldn't be a problem for the buffered I/O code, since the code already accounts for this possibility (if the lock is already acquired by the current thread, an exception is raised). Now, there's something I don't understand: I've just had a quick look, but AFAICT, there's no reason why the C version of RLock could not be available: am I missing something? Why do we even have a Python implementation?
msg150518 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-01-03 17:29
> Note that another solution would be to use a dedicated thread for > signal management (like Java does), but that's another story. That sounds like a good solution in the middle-term. Are there any drawbacks? (apart from launching a thread) > Also, this shouldn't be a problem for the buffered I/O code, since the > code already accounts for this possibility (if the lock is already > acquired by the current thread, an exception is raised). Yes, but raising an exception is less helpful than doing what the user asked for :) > Now, there's something I don't understand: I've just had a quick look, > but AFAICT, there's no reason why the C version of RLock could not be > available: am I missing something? Why do we even have a Python > implementation? The C version is quite recent, and there's a school of thought that we should always provide fallback Python implementations. (also, arguably a Python implementation makes things easier to prototype, although I don't think it's the case for an RLock)
msg150576 - (view)	Author: Charles-François Natali (neologix) *	Date: 2012-01-04 02:51
> That sounds like a good solution in the middle-term. Are there any > drawbacks? (apart from launching a thread) Just to be clear: the approach I was suggesting is to have a resident thread dedicated to signal management, not to spawn a new one when needed. Another advantage is that we could mask signals in all threads except this one, and have a consistent cross-platform behavior with respect to signals+threads. However I see two drawbacks: - it seems that we want to allow building Python without threads support. In that case, this wouldn't work, or we would need the current implementation as a fallback, but this would complicate the code somewhat. - the thread would have to be respawned after a fork() > The C version is quite recent, and there's a school of thought that we > should always provide fallback Python implementations. > (also, arguably a Python implementation makes things easier to > prototype, although I don't think it's the case for an RLock) Hmmm, I'm not convinced by this argument in this specific case: since the C version is now the default, is faster and more reliable, can't we drop the Python version?
msg150578 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-01-04 03:15
> This affects the python implementation of RLock only. In the issue #13550, it was discussed to remove completly the logging machinery from the threading module. If we remove it, we don't need the Python implementation of RLock. We already removed the Python implementation of subprocess.Popen._execute_child() in Python 3.3 because of issues with fork, signals and threads.
msg150579 - (view)	Author: Robert Collins (rbcollins) *	Date: 2012-01-04 03:25
Normally I advocate very strongly for Python implementation of C accelerated modules, but when the two implementations are not equivalent, having a simpler Python one around does not help anyone (not users, other language implementors etc). True reentrancy is possible but quite hard to achieve. So, FWIW, +1 on just having a C implementation. The dedicated signal thread sounds useful too - it would ease the issues for folk using other parts of the stdlib, and would in principle permit a pure-python RLock to hang around.
msg150600 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-01-04 10:52
> > That sounds like a good solution in the middle-term. Are there any > > drawbacks? (apart from launching a thread) > > Just to be clear: the approach I was suggesting is to have a resident > thread dedicated to signal management, not to spawn a new one when > needed. Another advantage is that we could mask signals in all threads > except this one, and have a consistent cross-platform behavior with > respect to signals+threads. Hmm, but that would break single-threaded programs which expect their select() (or other) to return EINTR when a signal is received (which is a perfectly valid expectation in that case). > However I see two drawbacks: > - it seems that we want to allow building Python without threads > support. In that case, this wouldn't work, or we would need the > current implementation as a fallback, but this would complicate the > code somewhat. I don't know if that's still useful to build Python without threads. I would expect most platforms to have a compatible threads implementation (and Python probably can't run on very small embedded platforms). Perhaps you can ask on python-dev.
msg150676 - (view)	Author: Charles-François Natali (neologix) *	Date: 2012-01-05 17:56
> Hmm, but that would break single-threaded programs which expect their > select() (or other) to return EINTR when a signal is received (which is > a perfectly valid expectation in that case). Yes, that's why I said "that"s another story" ;-) EINTR is really a pain, and relying on it to return from a syscall upon signal reception is a bad idea (certain OS restart syscalls by default - not select() though - and if the signal is received before you call the syscall, you'll deadlock). This would IMHO be the best way to go, but I know we can't reasonably change this now. > I don't know if that's still useful to build Python without threads. I > would expect most platforms to have a compatible threads implementation > (and Python probably can't run on very small embedded platforms). > Perhaps you can ask on python-dev. There are another problems, for example it's very well known that signals and threads don't mix well (so for example you'd have to block all signals before spawning the new thread, and reenable them afterwards). I'm not sure it's worth the extra complication. I can still try to write a quick patch to see if it gets somewhere (and doesn't break test_threading and test_signals). The first and most simple thing we could do would be to nuke the Python version (and also the loggin hack at the same time): does that sound reasonable ?
msg150688 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-01-05 20:54
> The first and most simple thing we could do would be to nuke the > Python version (and also the loggin hack at the same time): does that > sound reasonable ? To me, yes, but I think it's better to ask on python-dev as for nuking the Python version.
msg160131 - (view)	Author: Mike Meyer (Mike.Meyer)	Date: 2012-05-07 10:26
I just ran into this issue in the logging module using 2.7. Here's the TB in case it sheds any light on the issue Traceback (most recent call last): File "./crawler.py", line 531, in <module> main(argv[1:]1:) File "./crawler.py", line 522, in main MCP(config).run() File "./crawler.py", line 332, in run self.reaper() File "./crawler.py", line 359, in reaper logging.debug('MCP process alive: %s', state) File "/usr/local/lib/python2.7/logging/__init__.py", line 1600, in debug root.debug(msg, args, kwargs) File "/usr/local/lib/python2.7/logging/__init__.py", line 1120, in debug self._log(DEBUG, msg, args, *kwargs) File "/usr/local/lib/python2.7/logging/__init__.py", line 1250, in _log self.handle(record) File "/usr/local/lib/python2.7/logging/__init__.py", line 1260, in handle self.callHandlers(record) File "/usr/local/lib/python2.7/logging/__init__.py", line 1300, in callHandlers hdlr.handle(record) File "/usr/local/lib/python2.7/logging/__init__.py", line 746, in handle self.release() File "/usr/local/lib/python2.7/logging/__init__.py", line 700, in release self.lock.release() File "/usr/local/lib/python2.7/threading.py", line 147, in release self.__block.release() thread.error: release unlocked lock Since I'm not using threads, getting thread errors was a little bit of a surprise. I guess trying to make the logging module thread safe added a potential bug.
msg234926 - (view)	Author: Lothsahn (Lothsahn)	Date: 2015-01-28 23:36
I am using Python 2.6.5 (we will be upgrading to Python 2.7.9 soon) and I recently ran into this bug. If I do any locking in a signal handler with RLocks, the entire system can deadlock. I'm using this to serialize my IO so we don't have mismatched lines in our logs. It looks like RLock was implemented in C in 3.2. While I don't care about the performance benefits of that rewrite, having a non-deadlocking RLock implementation would be nice. Not sure if this issue can be fixed in 2.7, but it would be nice. C RLock implementation here: http://bugs.python.org/issue3001
msg234928 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-01-28 23:44
FYI I proposed a fix for eventlet to fix eventlet with Python 3 when monkey-patching is used: https://github.com/eventlet/eventlet/pull/187 The change forces the Python implementation of RLock, which is compatible with eventlet monkey-patching. The Python implementation of RLock gets the thread identifier from the monkey-patched threading module, whereas the C implementation calls directly a C function to get the identifier which is incompatible with eventlet. If the Python implementation is simply dropped, eventlet may uses its own implementation (copy/paste code from older Python version).
msg237134 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-03-03 13:25
> The attached test script demonstrates the issue on Python 2.6 and 3.2, and code inspection suggests this is still valid for 2.7 and 3.4. I disagree that Python 3.4 is affected: RLock has been reimplemented in C in Python 3.2. Only the Python implementation of RLock has the bug, but it's not used by fault, you have to explicitly use a private attribute of the threading module to get it. Python 2.6 only accept security fixes, so in fact only Python 2.7 may get a fix. The question is if it worth to backport 300 lines of C code from Python 3 to Python 2 in the _thread module. @Benjamin, release manager of Pythn 2.7: What do you think? I may help to backport the C implementation of RLock.
msg237135 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-03-03 13:30
This bug has existed for ages. People who want sane behaviour should just switch to Python 3. Closing.
msg237138 - (view)	Author: Roumen Petrov (rpetrov) *	Date: 2015-03-03 14:18
hmm issue still exist in master branch. Lets wait python 4 for sane behaviour.
msg237140 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-03-03 14:28
> hmm issue still exist in master branch. For the third time, only the Python implementation has the bug, and it's not used by default. So the bug was fixed in Python 3 since 3.2. It's time to upgrade guys ;-)
msg237141 - (view)	Author: Roumen Petrov (rpetrov) *	Date: 2015-03-03 14:31
STINNER Victor wrote: > For the third time, only the Python implementation has the bug, and > it's not used by default. So the bug was fixed in Python 3 since 3.2. > It's time to upgrade guys ;-) Did you mean to downgrade? Tested with Python 3.5.0a1+ (default, Feb 28 2015, 09:49:09)
msg237143 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-03-03 14:51
> Did you mean to downgrade? Tested with Python 3.5.0a1+ (default, Feb 28 2015, 09:49:09) Please make some effort to try to understand the issue. I attach hang2.py which doesn't force the Python implementation of RLock. You might try hang2.py on Python 2 and Python 3 if you are not convinced yet that Python 3 has not the bug...
msg237145 - (view)	Author: Roumen Petrov (rpetrov) *	Date: 2015-03-03 15:33
STINNER Victor wrote: > [SNIP]I attach hang2.py which doesn't force the Python implementation > of RLock.[SNIP] Ok. Fine with me.

History
Date	User	Action	Args
2022-04-11 14:57:25	admin	set	github: 57906
2015-03-03 15:33:06	rpetrov	set	messages: + msg237145
2015-03-03 14:51:08	vstinner	set	files: + hang2.py messages: + msg237143
2015-03-03 14:31:21	rpetrov	set	messages: + msg237141
2015-03-03 14:28:22	vstinner	set	messages: + msg237140
2015-03-03 14:18:53	rpetrov	set	nosy: + rpetrov messages: + msg237138
2015-03-03 13:30:55	pitrou	set	status: open -> closed resolution: rejected messages: + msg237135 stage: needs patch -> resolved
2015-03-03 13:25:46	vstinner	set	nosy: + benjamin.peterson messages: + msg237134 versions: - Python 3.2, Python 3.3
2015-03-03 12:56:48	maizy	set	nosy: + maizy
2015-01-28 23:44:03	vstinner	set	messages: + msg234928
2015-01-28 23:36:56	Lothsahn	set	nosy: + Lothsahn messages: + msg234926
2014-03-18 16:52:41	cvrebert	set	nosy: + cvrebert
2012-05-07 10:26:51	Mike.Meyer	set	nosy: + Mike.Meyer messages: + msg160131
2012-01-05 20:54:39	pitrou	set	messages: + msg150688
2012-01-05 17:56:59	neologix	set	messages: + msg150676
2012-01-04 10:52:14	pitrou	set	messages: + msg150600
2012-01-04 03:40:26	alex	set	nosy: + alex
2012-01-04 03:25:32	rbcollins	set	messages: + msg150579
2012-01-04 03:15:52	vstinner	set	nosy: + vstinner messages: + msg150578
2012-01-04 02:51:11	neologix	set	messages: + msg150576
2012-01-03 17:29:28	pitrou	set	messages: + msg150518
2012-01-03 17:20:57	neologix	set	nosy: + neologix messages: + msg150517
2012-01-03 14:40:35	pitrou	set	messages: + msg150499
2012-01-03 03:43:02	meador.inge	set	nosy: + meador.inge
2012-01-03 00:52:43	rbcollins	set	messages: + msg150488
2012-01-03 00:30:08	pitrou	set	nosy: + pitrou
2012-01-03 00:30:04	pitrou	set	stage: needs patch type: behavior versions: + Python 3.2
2012-01-02 23:10:16	jcea	set	nosy: + jcea messages: + msg150483 versions: + Python 3.3, - Python 3.4
2012-01-02 21:13:12	rbcollins	create