classification
Title: PyImport_ImportModuleNoBlock() may solve problems but causes others.
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: A finer grained import lock
View: 9260
Assigned To: Nosy List: belopolsky, brett.cannon, christian.heimes, eric.snow, grahamd, gvanrossum, jcea, mdengler, ncoghlan, pennersr, pitrou, ruseel
Priority: normal Keywords:

Created on 2010-03-09 10:47 by grahamd, last changed 2012-05-18 11:56 by pitrou. This issue is now closed.

Messages (11)
msg100713 - (view) Author: Graham Dumpleton (grahamd) Date: 2010-03-09 10:47
Back in time, the function PyImport_ImportModuleNoBlock() was introduced and used in modules such as the time module to supposedly avoid deadlocks when using threads. It may well have solved that problem, but only served to cause other problems.

To illustrate the problem consider the test code:


import imp
import thread
import time

def run1():
   print 'acquire'
   imp.acquire_lock()
   time.sleep(5)
   imp.release_lock()
   print 'release'

thread.start_new_thread(run1, ())

time.sleep(2)

print 'strptime'
time.strptime("", "")
print 'exit'


The output of running this is


grumpy:~ grahamd$ python noblock.py 
acquire
strptime
Traceback (most recent call last):
  File "noblock.py", line 17, in <module>
    time.strptime("", "")
ImportError: Failed to import _strptime because the import lockis held by another thread.


It is bit silly that code executing in one thread could fail because at the time that it tries to call time.strptime() a different thread has the global import lock.

This problem may not arise in applications which preload all modules, but it will where importing of modules is deferred until later within execution of a thread and where there may be concurrent threads running doing work that requires modules imported by that new C function.

Based on old discussion at:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/dad73ac47b81a744

my expectation is that this issue will be rejected as not being a problem with any remedy being pushed to the application developer. Personally I don't agree with that and believe that the real solution is to come up with an alternate fix for the original deadlock that doesn't introduce this new detrimental behaviour. This may entail structural changes to modules such as the time module to avoid issue.

Unfortunately since the PyImport_ImportModuleNoBlock() function has been introduced, it is starting to be sprinkled like fairy dust across modules in the standard library and in third party modules. This is only going to set up future problems in multithreaded applications, especially where third party module developers don't appreciate what problems they are potentially introducing by using this function.

Anyway, not optimistic from what I have seen that this will be changed, so view this as a protest against this behaviour. :-)

FWIW, issue in mod_wsgi issue tracker about this is:

http://code.google.com/p/modwsgi/issues/detail?id=177

I have known about this issue since early last year though.
msg110093 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-12 15:19
Interesting. Your example code doesn't run afoul of any of the "don't do this" flags relating to import that I put into the threading docs, so it seems like a fair complaint to me.

As Christian pointed out on c.l.p, the idea with this API is to turn deadlocks triggered due to violations of the first rule noted in http://docs.python.org/library/threading.html#importing-in-threaded-code into exceptions instead. This is *much* friendlier than the old behaviour.

However, as your example shows, there are cases involving the main module where the new approach will throw an exception but a deadlock would not have occurred if it had just waited for the import lock to be released.

Note the difference in behaviour if the invocation style is changed to hold the import lock when the module is executed:

$ ./python -c "import noblock"
acquire
strptime
exit
Unhandled exception in thread started by 
Error in sys.excepthook:

Original exception was:

(The bizarre exception noise at the end is due to the violation of rule 2 at the page linked above - since the example code used thread rather than threading, we tried to release an import lock that didn't exist any more. If the code had used threading instead it would have worked fine and printed "released" as expected)

My instinct says that allowing *_NoBlock() to block(!) when invoked from the main Python thread will "do the right thing", but I'm going to want to ponder that for a while. Inadvertently recreating the deadlocks that this whole mechanism is designed to eliminate would be dumb.
msg110095 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-12 15:26
Composed before reading Nick's comment:

"""
This issue is broader in scope than just the time module example that is given, so I am not taking this over.  With respect to datetime and time modules, there are several related issues I am working on which aim to reduce or eliminate the use of PyImport_ImportModuleNoBlock(): issue 9012 aims to eliminate the need to import time module from datetime; issue 7989 while not directly related, may lead to python dependencies being imported from python rather than C code. 
"""
msg110096 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-12 15:28
I'm curious - do the mod_wsgi problems go away if you arrange for the main module to run with the import lock held? Or do they turn into deadlocks?
msg110154 - (view) Author: Graham Dumpleton (grahamd) Date: 2010-07-13 02:26
Nick, there is no main module in the same way there is when using the Python command line. Ie., there is no module that has __name__ being __main__. The closest thing is the WSGI script file which holds the application object for the WSGI application. There could technically be many of these for the one (sub)interpreter. These WSGI script files are already imported as a module with magic name based on file system path to the WSGI script file and so the module import lock would already be held while doing so.

The specific situation which is coming up is not related to the import of those script files but after the WSGI script file has already been loaded. That is, when threads from an external thread pool are later calling into the WSGI application to handle individual HTTP requests to the WSGI application. Being multithreaded, two such HTTP requests could be handled at the same time. If one of those threads lazily imports a Python module and thus acquires the module import lock while doing so, and at the same time another thread calls into any of the various Python library functions which internally use PyImport_ImportModuleNoBlock() then you will get the exception described.

Because it relates to what multiple threads are doing at the same, then it is totally unpredictable as whether one can get hit by this problem. This is why the problem is so frustrating. Your code could work fine most of the time then all of a sudden you could have one request which hits this problem if it is the first time to call these functions and another thread just happens to be importing a totally unrelate module at the same time.

So, that is the more in depth problem that is occurring in practice. I boiled it down to the simple example so people could see it and reproduce it independent of mod_wsgi.
msg110178 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-13 12:21
OK, finally tracked down some of the history here. Issue 1567 is the one where Christian developed the relevant patch which was then committed as r59678.

Issue 7980 is also potentially related (although that's apparently a Mac OS X crash involving time.strptime rather than an ImportError exception).

I think what it boils down to is that this approach to the import deadlock problem is far too pessimistic. If any other import is going on when an implicit import is triggered using this function makes the assumption that the other thread is going to be waiting for the current thread at some point, so this thread can't afford to wait for the import lock and should throw an exception instead. That's going to disallow a lot of perfectly legitimate imports, particularly when the Python interpreter is embedded in a multi-threaded application rather than running standalone.
msg110183 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 12:43
Is there some documentation somewhere of which global variables the import lock is meant to protect? Perhaps it wouldn't need to be held during the whole module initialization process.
msg110186 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-13 13:14
Pretty much anything mentioned in PEP 302 is what is being protected.

The one which causes most of the grief is the fact that we need to stick a partially initialised module into sys.modules (so the module can find itself if it needs to), then take it out again if execution of the module level code fails. That means we need to hold the import lock while the module code is executed.
msg110187 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-13 13:16
Added Guido as well, to see how much he remembers from back when Christian's patch went in.

We may want to take this over to python-dev...
msg110189 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-13 13:36
> Pretty much anything mentioned in PEP 302 is what is being protected.

Alright, my question was more along the lines of "what *needs* to be
protected". The import lock is currently like the GIL: a very simple
(and therefore suboptimal) answer to a complex problem.

I am wondering whether one could revive Marc-André's idea here:
http://mail.python.org/pipermail/python-dev/2003-February/033445.html

If the sole purpose is to prevent multi-threaded import of the same
module, then we can have an internal dict of temporary locks for each
module being imported. The "global import lock" itself can be reduced to
protecting accesses to this dict.

If custom (third-party) import hooks are commonly non-thread safe, then
the "global import lock" can still be taken when such an import hook is
called. But built-in import mechanisms can be more tolerant.
msg161049 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-18 11:56
Python 3.3 now has per-module import locks, and PyImport_ImportModuleNoBlock simply redirects to PyImport_ImportModule. See issue9260.
History
Date User Action Args
2012-05-18 11:56:27pitrousetstatus: open -> closed
superseder: A finer grained import lock
messages: + msg161049

resolution: duplicate
stage: resolved
2012-05-18 06:03:14mdenglersetnosy: + mdengler
2011-12-21 03:39:10eric.snowsetnosy: + eric.snow
2011-12-21 03:17:32jceasetnosy: + jcea
2011-02-10 16:00:46pennersrsetnosy: + pennersr
2010-09-28 12:28:02ruseelsetnosy: + ruseel
2010-07-13 13:36:39pitrousetmessages: + msg110189
2010-07-13 13:16:44ncoghlansetnosy: + gvanrossum
messages: + msg110187
2010-07-13 13:14:27ncoghlansetmessages: + msg110186
2010-07-13 12:43:31pitrousetnosy: + pitrou
messages: + msg110183
2010-07-13 12:21:42ncoghlansetmessages: + msg110178
2010-07-13 02:26:24grahamdsetmessages: + msg110154
2010-07-12 15:28:38ncoghlansetmessages: + msg110096
2010-07-12 15:26:45belopolskysetmessages: + msg110095
2010-07-12 15:19:27ncoghlansetnosy: + christian.heimes
messages: + msg110093
2010-07-12 14:01:40BreamoreBoysetnosy: + ncoghlan
2010-07-12 13:57:16pitrousetnosy: + brett.cannon, belopolsky

versions: + Python 3.2, - Python 2.6
2010-03-09 10:47:23grahamdcreate