Issue 25668: Deadlock in logging caused by a possible race condition with "format"

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69854

classification

Title:	Deadlock in logging caused by a possible race condition with "format"
Type:	crash	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	fviard, iritkatriel, urnenfeld, vinay.sajip, yselivanov
Priority:	normal	Keywords:

Created on 2015-11-19 14:22 by fviard, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
testcase-bug-python-deadlock-logging.py	fviard, 2015-11-19 14:22	testcase causing a deadlock with logging

Messages (13)
msg254896 - (view)	Author: Florent Viard (fviard)	Date: 2015-11-19 14:22
When an user, use logging, to try to display an object that uses some threading locks, there could be a race condition and the logging module will deadlock. So, any thread that will try to use logging, will then be stuck forever. Please see the following test case that is a simplification of a case that I have encountered. I'm able to reproduce the issue on python 3.4.3, and python 2.7.9. But, based on the code, I think that all versions are affected. Basically, I try to log a variable that is like a property or the __unicode__ function of an object. But this value will not be calculated before entering the "logging.warning" code but in the "self.format" of the logging handler. This self.format is called under the umbrella of the big lock of "logging". So, if function to get my variable value involve taking another lock and then performing a logging call. There is great chances that another thread also took the variable lock between the time that I took the logging lock and before it take the variable lock. So, the first thread will wait for the variable lock but will have the logging lock. But the other thread will have the variable lock but will wait the logging lock. Traceback of the current situation: # ThreadID: 140410437482240 File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap self.__bootstrap_inner() File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File: "/usr/lib/python2.7/threading.py", line 763, in run self.__target(self.__args, self.__kwargs) File: "testcase-bug-python-deadlock-logging.py", line 37, in second_thread my_db.log_pid() File: "testcase-bug-python-deadlock-logging.py", line 24, in log_pid logging.warning(u"my_db pid is: %s", u"1234") File: "/usr/lib/python2.7/logging/__init__.py", line 1604, in warning root.warning(msg, args, kwargs) File: "/usr/lib/python2.7/logging/__init__.py", line 1164, in warning self._log(WARNING, msg, args, kwargs) File: "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log self.handle(record) File: "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle self.callHandlers(record) File: "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers hdlr.handle(record) File: "/usr/lib/python2.7/logging/__init__.py", line 747, in handle self.acquire() File: "/usr/lib/python2.7/logging/__init__.py", line 698, in acquire self.lock.acquire() File: "/usr/lib/python2.7/threading.py", line 173, in acquire rc = self.__block.acquire(blocking) # ThreadID: 140410445874944 File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap self.__bootstrap_inner() File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File: "/usr/lib/python2.7/threading.py", line 763, in run self.__target(self.__args, self.__kwargs) File: "testcase-bug-python-deadlock-logging.py", line 31, in first_thread logging.warning(u"My slow pid is: %s", my_db) File: "/usr/lib/python2.7/logging/__init__.py", line 1604, in warning root.warning(msg, args, kwargs) File: "/usr/lib/python2.7/logging/__init__.py", line 1164, in warning self._log(WARNING, msg, args, kwargs) File: "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log self.handle(record) File: "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle self.callHandlers(record) File: "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers hdlr.handle(record) File: "/usr/lib/python2.7/logging/__init__.py", line 749, in handle self.emit(record) File: "/usr/lib/python2.7/logging/__init__.py", line 851, in emit msg = self.format(record) File: "/usr/lib/python2.7/logging/__init__.py", line 724, in format return fmt.format(record) File: "/usr/lib/python2.7/logging/__init__.py", line 464, in format record.message = record.getMessage() File: "/usr/lib/python2.7/logging/__init__.py", line 328, in getMessage msg = msg % self.args File: "testcase-bug-python-deadlock-logging.py", line 17, in __unicode__ with db_lock: File: "/usr/lib/python2.7/threading.py", line 173, in acquire rc = self.__block.acquire(blocking)
msg254897 - (view)	Author: Florent Viard (fviard)	Date: 2015-11-19 14:29
Looking at the code, this issue makes sense in logging/__init__.py (+738): def handle(self, record): """ Conditionally emit the specified logging record. Emission depends on filters which may have been added to the handler. Wrap the actual emission of the record with acquisition/release of the I/O thread lock. Returns whether the filter passed the record for emission. """ rv = self.filter(record) if rv: self.acquire() try: self.emit(record) finally: self.release() return rv Than, in the "emit()" of whatever handler used, there is first: msg = self.format(record) In my opinion, a possible fix would be to change this code to something like: rv = self.filter(record) if rv: record.formatted_msg = self.format(record) self.acquire() try: self.emit(record) finally: self.release() return rv And then, change all the "emit()" of log handlers from: msg = self.format(record) to msg = record.processed_msg By not modifying the "msg" and "args" parameters of the record, all the code that would have reimplemented the standard log handler will still be working fine (even if they will still be at risk of encountering the deadlock)
msg254901 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-11-19 14:58
But not doing the format until the last moment is part of the design of logging (as low overhead as possible unless a message is actually emitted). I suspect you are just going to have to pre-calculate that variable if you want to log it. Let's see what Vinay thinks, though :)
msg254905 - (view)	Author: Florent Viard (fviard)	Date: 2015-11-19 15:21
I understand that it should have been the reason of this. But in my opinion, it is very bad to possibly have calculations and user space arbitrary operations inside the logging lock. If you look at my proposition, you can do the format after the filter, but just before acquiring the lock. As most handlers start by calling the self.format() soon in the "emit()", I don't expect a big difference. I'm not entirely sure that it is a good solution, but I think that the issue is pretty bad.
msg254907 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-11-19 15:30
Using the lock to cover as little as possible makes sense, but there may be a reason format is covered by the lock. I don't know the code well enough to say, we'll have to wait for Vinay.
msg255299 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2015-11-24 23:03
handle() and emit() are high level methods of a handler, and format() is at a lower level. Not all emit() methods will call format(). For example, socket-based and queue-based handlers don't. So it is not in general possible to separate format() out - you will need to have a customised handler to deal with your somewhat unusual use case, and do whatever you need in there. Threading is a complex area and while logging is one specific case you may have come across, it's entirely possible to have a situation with any other lock (e.g. in your application) where acquiring the lock and calling __unicode__() on one of your objects will also result in a deadlock. So your proposal wouldn't fix the problem in a general way - just move it so that you might avoid the problem, but a more esoteric use of locks wouldn't necessarily work. In general, to avoid deadlocks, you have to acquire locks in a fixed order, and only you know what those locks are - so you can implement the appropriate acquisition and release code in your handle().
msg258145 - (view)	Author: Florent Viard (fviard)	Date: 2016-01-13 14:59
Sorry to reply after a so long time. Please don't close this issue too fast, there is really a big issue that is not related to my specific case. (Note that the attached test case is not the real case, but a small piece of code that is able to reproduce the issue consistently) Also, I don't pretend that my fix proposal is the solution, but the issue definitively exists. To reply to your comment: <<< handle() and emit() are high level methods of a handler, and format() is at a lower level. Not all emit() methods will call format(). For example, socket-based and queue-based handlers don't. So it is not in general possible to separate format() out - you will need to have a customised handler to deal with your somewhat unusual use case, and do whatever you need in there. >>> That is not true even if it is a little hidden in the code: - SocketHandler: emit first calls makePickle that almost starts with "record.getMessages()" that does the interesting thing. (self.format mainly does getMessages(), that is the function that gets the messages arguments, eventually from properties) - QueueHandler: emit first calls "prepare(record)", that starts by calling self.format(record). But after having looked deeper at the code, maybe my fix proposal could be modified to be something like: new function in "LogRecord" class: def stringifyMessage(self): # CODE THAT WAS IN getMessages() self.processed_msg = str(self.msg) if self.args: self.processed_msg = self.processed_msg % self.args modify the getMessages function to: def getMessages(self): if not self.processed_msg: self.stringifyMessage() return self.processed_msg Then, in the "__init__.py" handle function, add the following before "self.acquire()": record.stringifyMessage() Thus, no need to change anything, any handler, but the "message" arguments will be processed before getting the lock of "logging". Regarding the second part of your comment, just saying that "generally" threading locks are a complex topic, doesn't remove the fact that in this specific logging case, there is a real bug regarding how python works. Here, you assume that an user of "logging" is supposed to know how the inside thread locking of "logging" works because the section under locking of logging leaks into "user space code". I'm pretty sure that almost everyone using logging today would not think that such an issue could arise. I could agree with your point if the problem that was solvable, but i'm pretty sure that there is at least a solution that is working with backward compatibility and at worst minor disturbance to some very specific users of some specific handlers.
msg258146 - (view)	Author: Florent Viard (fviard)	Date: 2016-01-13 15:04
Sorry, typo in my last sentence: "I would have agreed with your point if the problem was NOT solvable,..."
msg258169 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2016-01-13 21:16
Logging has been in the Python stdlib for over a dozen years, and in all that time, no one else has had a problem with the way handler locks and formatting work in logging. Your problem arises because your use case is very unusual, and this is why I don't propose to make changes to the stdlib to cater for it. You can, as I've suggested earlier, subclass the handlers you need and implement their emit methods as per your needs. Any changes to the stdlib code along the lines you've proposed could well break existing code for other people, so making these changes is not warranted.
msg258177 - (view)	Author: Florent Viard (fviard)	Date: 2016-01-13 22:46
Come on, please stop trying to close the issue so fast without deeply thinking about it. A lot of years without a "detailed" bug report indicating the root cause is not a "proof" that there is no bug. Otherwise, you can say that there is no more bug in python, because people used it for 15 years... The issue looks easy with the explanation and the test case, but it took me a lot of hours of investigation to figure out the origin of the freeze of python that I was lucky to be able to reproduce because of good specific conditions. And so I was able to try to have stack traces to find the origin of the issue. But race conditions like that are hard to catch, and to investigate but can bit anyone. Please tell me what is unusual for you with this testcase? The sleeps are here to put you in the "good"/bad condition, but you can remove it and try to run the testcase 1 million times to be in the standard race condition situation...
msg260523 - (view)	Author: urnenfed (urnenfeld)	Date: 2016-02-19 16:44
I am logging from a different thread than the thread which actually created the object. Though I am getting its reference in a very odd way I can reproduce the following while is not a deadlock, I believe is related: Traceback (most recent call last): File "/usr/lib/python2.7/logging/__init__.py", line 851, in emit msg = self.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 724, in format return fmt.format(record) File "/usr/lib/python2.7/logging/__init__.py", line 464, in format record.message = record.getMessage() File "/usr/lib/python2.7/logging/__init__.py", line 328, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting
msg260530 - (view)	Author: Vinay Sajip (vinay.sajip) *	Date: 2016-02-19 22:06
@Oliver Ruiz Dorantes I don't see how yuou reach that conclusion from what you're showing - it just looks like s mismatch bertween format string and arguments. If you can shrink it down to a small standalone script that demonstrates the problem, it would be helpful for diagnosis; the stack trace you provide isn't enough for this, as I have seen similar traces many a time due to mismatched format string and arguments.
msg407753 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-12-05 22:46
> Please tell me what is unusual for you with this testcase? Grabbing a lock in __str__/__repr__ strikes me as unusual and a recipe for problems. You don't really know when those functions are called - from the debugger, from exception handlers, etc.. Even if the logging module protected you from the deadlock you found, you won't be safe from similar situations with other libraries that use locks.

History
Date	User	Action	Args
2022-04-11 14:58:24	admin	set	github: 69854
2021-12-12 12:03:33	iritkatriel	set	status: pending -> closed resolution: wont fix stage: resolved
2021-12-05 22:46:55	iritkatriel	set	status: open -> pending nosy: + iritkatriel messages: + msg407753
2016-02-19 22:06:40	vinay.sajip	set	messages: + msg260530
2016-02-19 16:44:59	urnenfeld	set	nosy: + urnenfeld messages: + msg260523
2016-01-15 23:51:24	terry.reedy	set	versions: - Python 3.2, Python 3.3, Python 3.4
2016-01-14 20:13:00	yselivanov	set	nosy: + yselivanov
2016-01-13 22:46:42	fviard	set	status: closed -> open resolution: not a bug -> (no value) messages: + msg258177
2016-01-13 21:16:40	vinay.sajip	set	status: open -> closed resolution: not a bug messages: + msg258169
2016-01-13 19:20:24	r.david.murray	set	nosy: - r.david.murray
2016-01-13 15:04:36	fviard	set	messages: + msg258146
2016-01-13 14:59:54	fviard	set	status: closed -> open resolution: not a bug -> (no value) messages: + msg258145
2015-11-24 23:03:12	vinay.sajip	set	status: open -> closed resolution: not a bug messages: + msg255299
2015-11-19 15:30:30	r.david.murray	set	messages: + msg254907
2015-11-19 15:21:36	fviard	set	messages: + msg254905
2015-11-19 14:58:40	r.david.murray	set	nosy: + vinay.sajip, r.david.murray messages: + msg254901
2015-11-19 14:29:12	fviard	set	messages: + msg254897
2015-11-19 14:22:53	fviard	create