classification
Title: Deadlock in logging caused by a possible race condition with "format"
Type: crash Stage:
Components: Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: fviard, urnenfeld, vinay.sajip, yselivanov
Priority: normal Keywords:

Created on 2015-11-19 14:22 by fviard, last changed 2016-02-19 22:06 by vinay.sajip.

Files
File name Uploaded Description Edit
testcase-bug-python-deadlock-logging.py fviard, 2015-11-19 14:22 testcase causing a deadlock with logging
Messages (12)
msg254896 - (view) Author: Florent Viard (fviard) Date: 2015-11-19 14:22
When an user, use logging, to try to display an object that uses some threading locks, there could be a race condition and the logging module will deadlock. So, any thread that will try to use logging, will then be stuck forever.

Please see the following test case that is a simplification of a case that I have encountered. I'm able to reproduce the issue on python 3.4.3, and python 2.7.9. But, based on the code, I think that all versions are affected.

Basically, I try to log a variable that is like a property or the __unicode__ function of an object. But this value will not be calculated before entering the "logging.warning" code but in the "self.format" of the logging handler. This self.format is called under the umbrella of the big lock of "logging".

So, if function to get my variable value involve taking another lock and then performing a logging call. There is great chances that another thread also took the variable lock between the time that I took the logging lock and before it take the variable lock.

So, the first thread will wait for the variable lock but will have the logging lock.
But the other thread will have the variable lock but will wait the logging lock.


Traceback of the current situation:
# ThreadID: 140410437482240
File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  self.run()
File: "/usr/lib/python2.7/threading.py", line 763, in run
  self.__target(*self.__args, **self.__kwargs)
File: "testcase-bug-python-deadlock-logging.py", line 37, in second_thread
  my_db.log_pid()
File: "testcase-bug-python-deadlock-logging.py", line 24, in log_pid
  logging.warning(u"my_db pid is: %s", u"1234")
File: "/usr/lib/python2.7/logging/__init__.py", line 1604, in warning
  root.warning(msg, *args, **kwargs)
File: "/usr/lib/python2.7/logging/__init__.py", line 1164, in warning
  self._log(WARNING, msg, args, **kwargs)
File: "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log
  self.handle(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle
  self.callHandlers(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers
  hdlr.handle(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 747, in handle
  self.acquire()
File: "/usr/lib/python2.7/logging/__init__.py", line 698, in acquire
  self.lock.acquire()
File: "/usr/lib/python2.7/threading.py", line 173, in acquire
  rc = self.__block.acquire(blocking)

# ThreadID: 140410445874944
File: "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  self.run()
File: "/usr/lib/python2.7/threading.py", line 763, in run
  self.__target(*self.__args, **self.__kwargs)
File: "testcase-bug-python-deadlock-logging.py", line 31, in first_thread
  logging.warning(u"My slow pid is: %s", my_db)
File: "/usr/lib/python2.7/logging/__init__.py", line 1604, in warning
  root.warning(msg, *args, **kwargs)
File: "/usr/lib/python2.7/logging/__init__.py", line 1164, in warning
  self._log(WARNING, msg, args, **kwargs)
File: "/usr/lib/python2.7/logging/__init__.py", line 1271, in _log
  self.handle(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 1281, in handle
  self.callHandlers(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 1321, in callHandlers
  hdlr.handle(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 749, in handle
  self.emit(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 851, in emit
  msg = self.format(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 724, in format
  return fmt.format(record)
File: "/usr/lib/python2.7/logging/__init__.py", line 464, in format
  record.message = record.getMessage()
File: "/usr/lib/python2.7/logging/__init__.py", line 328, in getMessage
  msg = msg % self.args
File: "testcase-bug-python-deadlock-logging.py", line 17, in __unicode__
  with db_lock:
File: "/usr/lib/python2.7/threading.py", line 173, in acquire
  rc = self.__block.acquire(blocking)
msg254897 - (view) Author: Florent Viard (fviard) Date: 2015-11-19 14:29
Looking at the code, this issue makes sense

in logging/__init__.py (+738):
    def handle(self, record):
        """
        Conditionally emit the specified logging record.
        Emission depends on filters which may have been added to the handler.
        Wrap the actual emission of the record with acquisition/release of
        the I/O thread lock. Returns whether the filter passed the record for
        emission.
        """
        rv = self.filter(record)
        if rv:
            self.acquire()
            try:
                self.emit(record)
            finally:
                self.release()
        return rv

Than, in the "emit()" of whatever handler used, there is first:
msg = self.format(record)

In my opinion, a possible fix would be to change this code to something like:
        rv = self.filter(record)
        if rv:
            record.formatted_msg = self.format(record)
            self.acquire()
            try:
                self.emit(record)
            finally:
                self.release()
        return rv

And then, change all the "emit()" of log handlers from:
     msg = self.format(record)
to
     msg = record.processed_msg

By not modifying the "msg" and "args" parameters of the record, all the code that would have reimplemented the standard log handler will still be working fine (even if they will still be at risk of encountering the deadlock)
msg254901 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-19 14:58
But not doing the format until the last moment is part of the design of logging (as low overhead as possible unless a message is actually emitted).

I suspect you are just going to have to pre-calculate that variable if you want to log it.  Let's see what Vinay thinks, though :)
msg254905 - (view) Author: Florent Viard (fviard) Date: 2015-11-19 15:21
I understand that it should have been the reason of this.

But in my opinion, it is very bad to possibly have calculations and user space arbitrary operations inside the logging lock.

If you look at my proposition, you can do the format after the filter, but just before acquiring the lock. As most handlers start by calling the self.format() soon in the "emit()", I don't expect a big difference.

I'm not entirely sure that it is a good solution, but I think that the issue is pretty bad.
msg254907 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-19 15:30
Using the lock to cover as little as possible makes sense, but there may be a reason format is covered by the lock.  I don't know the code well enough to say, we'll have to wait for Vinay.
msg255299 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2015-11-24 23:03
handle() and emit() are high level methods of a handler, and format() is at a lower level. Not all emit() methods will call format(). For example, socket-based and queue-based handlers don't. So it is not in general possible to separate format() out - you will need to have a customised handler to deal with your somewhat unusual use case, and do whatever you need in there.

Threading is a complex area and while logging is one specific case you may have come across, it's entirely possible to have a situation with any other lock (e.g. in your application) where acquiring the lock and calling __unicode__() on one of your objects will also result in a deadlock. So your proposal wouldn't fix the problem in a general way - just move it so that you might avoid the problem, but a more esoteric use of locks wouldn't necessarily work. In general, to avoid deadlocks, you have to acquire locks in a fixed order, and only you know what those locks are - so you can implement the appropriate acquisition and release code in your handle().
msg258145 - (view) Author: Florent Viard (fviard) Date: 2016-01-13 14:59
Sorry to reply after a so long time.

Please don't close this issue too fast, there is really a big issue that is not related to my specific case.
(Note that the attached test case is not the real case, but a small piece of code that is able to reproduce the issue consistently)

Also, I don't pretend that my fix proposal is the solution, but the issue definitively exists.

To reply to your comment:
<<<
handle() and emit() are high level methods of a handler, and format() is at a lower level. Not all emit() methods will call format(). For example, socket-based and queue-based handlers don't. So it is not in general possible to separate format() out - you will need to have a customised handler to deal with your somewhat unusual use case, and do whatever you need in there.
>>>
That is not true even if it is a little hidden in the code:
- SocketHandler: emit first calls makePickle that almost starts with "record.getMessages()" that does the interesting thing.
(self.format mainly does getMessages(), that is the function that gets the messages arguments, eventually from properties)
- QueueHandler: emit first calls "prepare(record)", that starts by calling self.format(record).

But after having looked deeper at the code, maybe my fix proposal could be modified to be something like:
new function in "LogRecord" class:
 def stringifyMessage(self):
     # CODE THAT WAS IN getMessages()
     self.processed_msg = str(self.msg)
     if self.args:
         self.processed_msg = self.processed_msg % self.args

modify the getMessages function to:
 def getMessages(self):
     if not self.processed_msg:
         self.stringifyMessage()
     return self.processed_msg

Then, in the "__init__.py" handle function, add the following before "self.acquire()":
record.stringifyMessage()

Thus, no need to change anything, any handler, but the "message" arguments will be processed before getting the lock of "logging".

Regarding the second part of your comment, just saying that "generally" threading locks are a complex topic, doesn't remove the fact that in this specific logging case, there is a real bug regarding how python works. Here, you assume that an user of "logging" is supposed to know how the inside thread locking of "logging" works because the section under locking of logging leaks into "user space code". I'm pretty sure that almost everyone using logging today would not think that such an issue could arise.

I could agree with your point if the problem that was solvable, but i'm pretty sure that there is at least a solution that is working with backward compatibility and at worst minor disturbance to some very specific users of some specific handlers.
msg258146 - (view) Author: Florent Viard (fviard) Date: 2016-01-13 15:04
Sorry, typo in my last sentence:
"I would have agreed with your point if the problem was NOT solvable,..."
msg258169 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2016-01-13 21:16
Logging has been in the Python stdlib for over a dozen years, and in all that time, no one else has had a problem with the way handler locks and formatting work in logging. Your problem arises because your use case is very unusual, and this is why I don't propose to make changes to the stdlib to cater for it. You can, as I've suggested earlier, subclass the handlers you need and implement their emit methods as per your needs. Any changes to the stdlib code along the lines you've proposed could well break existing code for other people, so making these changes is not warranted.
msg258177 - (view) Author: Florent Viard (fviard) Date: 2016-01-13 22:46
Come on, please stop trying to close the issue so fast without deeply thinking about it.

A lot of years without a "detailed" bug report indicating the root cause is not a "proof" that there is no bug. Otherwise, you can say that there is no more bug in python, because people used it for 15 years...

The issue looks easy with the explanation and the test case, but it took me a lot of hours of investigation to figure out the origin of the freeze of python that I was lucky to be able to reproduce because of good specific conditions. And so I was able to try to have stack traces to find the origin of the issue. But race conditions like that are hard to catch, and to investigate but can bit anyone.

Please tell me what is unusual for you with this testcase? The sleeps are here to put you in the "good"/bad condition, but you can remove it and try to run the testcase 1 million times to be in the standard race condition situation...
msg260523 - (view) Author: urnenfed (urnenfeld) Date: 2016-02-19 16:44
I am logging from a different thread than the thread which actually created the object. Though I am getting its reference in a very odd way

I can reproduce the following while is not a deadlock, I believe is related:

Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 851, in emit
    msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 724, in format
    return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 464, in format
    record.message = record.getMessage()
  File "/usr/lib/python2.7/logging/__init__.py", line 328, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
msg260530 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2016-02-19 22:06
@Oliver Ruiz Dorantes I don't see how yuou reach that conclusion from what you're showing - it just looks like s mismatch bertween format string and arguments. If you can shrink it down to a small standalone script that demonstrates the problem, it would be helpful for diagnosis; the stack trace you provide isn't enough for this, as I have seen similar traces many a time due to mismatched format string and arguments.
History
Date User Action Args
2016-02-19 22:06:40vinay.sajipsetmessages: + msg260530
2016-02-19 16:44:59urnenfeldsetnosy: + urnenfeld
messages: + msg260523
2016-01-15 23:51:24terry.reedysetversions: - Python 3.2, Python 3.3, Python 3.4
2016-01-14 20:13:00yselivanovsetnosy: + yselivanov
2016-01-13 22:46:42fviardsetstatus: closed -> open
resolution: not a bug ->
messages: + msg258177
2016-01-13 21:16:40vinay.sajipsetstatus: open -> closed
resolution: not a bug
messages: + msg258169
2016-01-13 19:20:24r.david.murraysetnosy: - r.david.murray
2016-01-13 15:04:36fviardsetmessages: + msg258146
2016-01-13 14:59:54fviardsetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg258145
2015-11-24 23:03:12vinay.sajipsetstatus: open -> closed
resolution: not a bug
messages: + msg255299
2015-11-19 15:30:30r.david.murraysetmessages: + msg254907
2015-11-19 15:21:36fviardsetmessages: + msg254905
2015-11-19 14:58:40r.david.murraysetnosy: + vinay.sajip, r.david.murray
messages: + msg254901
2015-11-19 14:29:12fviardsetmessages: + msg254897
2015-11-19 14:22:53fviardcreate