classification
Title: A finer grained import lock
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, asvetlov, belopolsky, brett.cannon, christian.heimes, eric.snow, grahamd, gvanrossum, haypo, loewis, ncoghlan, neologix, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2010-07-14 14:44 by pitrou, last changed 2012-05-18 14:53 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
implock.patch pitrou, 2010-07-14 14:48
implock3.patch pitrou, 2011-12-28 23:17 review
implock5.patch pitrou, 2011-12-30 18:08 review
module_locks.patch pitrou, 2012-04-28 20:57
module_locks2.patch pitrou, 2012-04-28 23:16
module_locks3.patch pitrou, 2012-04-29 00:59
module_locks4.patch pitrou, 2012-05-05 18:21
module_locks5.patch pitrou, 2012-05-05 19:03
module_locks6.patch pitrou, 2012-05-05 19:16
module_locks7.patch pitrou, 2012-05-05 19:50 review
module_locks8.patch pitrou, 2012-05-08 13:39 review
module_locks9.patch pitrou, 2012-05-10 16:01 review
Messages (35)
msg110287 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-14 14:44
This is an implementation of the idea suggested in:
http://mail.python.org/pipermail/python-dev/2003-February/033445.html

The patch creates a dictionary of reentrant locks keyed by module full name. Trying to import a module or package will first get the lock for that module (or, if necessary, create it) and then acquire it. This is done for any module type.

The global import lock is still there, but only used for two things:
- serializing first time creation of module-specific locks
- protection of imports based on import hooks, since we don't know whether third-party import hooks are themselves thread-safe

Semantics of the public C API are unchanged, because it is not clear whether they should be or not (concerns of usefulness vs. compatibility). For example, PyImport_ImportModuleNoBlock() still uses the global import lock but this could be relaxed in a later patch.
msg110316 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-14 19:12
So I say we don't worry about loaders being thread-safe. If __import__ handles the locking for a specific module then it will hold the lock on behalf of the loader. Now if someone decides to call load_module on their own, that's there business, but they should be aware of what could happen if they do that without acquiring the lock themselves. Otherwise we just make sure to provide a context manager that takes the name of the module and people can use that when they make their call to loader.load_module.
msg110318 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-14 19:34
> So I say we don't worry about loaders being thread-safe. If __import__
> handles the locking for a specific module then it will hold the lock
> on behalf of the loader.

Yes but what happens if two different modules are imported from two
different threads, and handled by the same loader? The loader could have
global structures which rely on serialization of imports for
consistency.
msg110323 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-14 19:59
On Wed, Jul 14, 2010 at 12:34, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> > So I say we don't worry about loaders being thread-safe. If __import__
> > handles the locking for a specific module then it will hold the lock
> > on behalf of the loader.
>
> Yes but what happens if two different modules are imported from two
> different threads, and handled by the same loader? The loader could have
> global structures which rely on serialization of imports for
> consistency.
>

That's why I said we should supply a context decorator (or function) which
will handle the lock appropriately, taking the name of the module to import
as an argument so the locking is fine-grained.
msg110324 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-14 20:03
> That's why I said we should supply a context decorator (or function) which
> will handle the lock appropriately, taking the name of the module to import
> as an argument so the locking is fine-grained.

Ok, so what are you saying is that we can break compatibility for non
thread-safe import loaders which currently work fine?
(I have nothing against it, just trying to be sure we agree on the
implications)
msg110331 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-14 21:29
What I'm saying is that loaders are quite possibly not thread-safe already, so we don't need to do any special for them. If you look at PEP 302 you will notice not a single mention of loaders needing to care about the import lock because there is no mention of the import lock! So changing the locking mechanism most likely won't break loaders because they are not using the current import lock anyway and so already have their own issues.

As long as __import__ does the proper locking on behalf of loaders and we provide a way for people to use the lock if they want to then I am not worried about the impact on loaders. For example, this will change the logic in importlib where the current import lock is grabbed, but otherwise won't change a thing in terms of the code for the various loaders it implements.
msg110332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-14 21:47
> So changing the locking mechanism most likely won't break loaders
> because they are not using the current import lock anyway and so
> already have their own issues.

Are you sure they aren't using it implicitly? 
In vanilla py3k, PyImport_ImportModuleLevel() takes the import lock
therefore it protects any inner code, including the various hooks.
msg110336 - (view) Author: Graham Dumpleton (grahamd) Date: 2010-07-14 22:36
How is this going to deal with cyclical imports where different threads could import at the same time different modules within that cycle? I need to look through the proposed patch and work out exactly what it does, but am concerned about whether this approach would cause the classic deadlock problem if not done right?

FWIW, this concept of a lock per module is what I used in the mod_python module importer when it was rewritten. I would have to go look back over that code and see how the way the concept is being implemented differs, but there was one remaining potential race condition in the mod_python code which could in rare instances cause a problem. I never did get around to fixing it. Anyway, what I did learn was that this approach isn't necessarily as simple as it may seem so it will need some really good analysis on whatever solution is developed to ensure subtle problems don't come up.
msg110340 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-14 23:40
That's my point; loaders are using the lock implicitly so that's why we don't need to worry about the global import lock just for path hooks. It seems like you are suggesting using the global import lock purely for compatibility, and what I am saying is that loaders don't care so there is no compatibility to care about. I say only use the global import lock for serializing creation.
msg110356 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-15 10:49
> Graham Dumpleton <Graham.Dumpleton@gmail.com> added the comment:
> 
> How is this going to deal with cyclical imports where different
> threads could import at the same time different modules within that
> cycle? I need to look through the proposed patch and work out exactly
> what it does, but am concerned about whether this approach would cause
> the classic deadlock problem if not done right?

You're right, I hadn't thought about that. Additional machinery will be
needed to detect potential deadlocks (and break them).
msg110357 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-15 11:32
> That's my point; loaders are using the lock implicitly so that's why
> we don't need to worry about the global import lock just for path
> hooks. It seems like you are suggesting using the global import lock
> purely for compatibility, and what I am saying is that loaders don't
> care so there is no compatibility to care about. I say only use the
> global import lock for serializing creation.

What is your take on the threadimp2.patch in issue9251?
msg110379 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-15 16:14
I'll have a look when I can (hopefully EuroPython).
msg150322 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-28 23:17
New prototype with per-module import locks and deadlock avoidance.
When a deadlock due to threaded circular imports is detected, the offending import returns the partially constructed module object (as would happen in single-threaded mode).

Probably lacks a test and some cleanup.
msg150330 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-29 13:45
IIUC, the deadlock avoidance code just checks that acquiring a
per-module lock won't create a cycle.
However, I think there's a race, because the cycle detection and the
lock acquisition is not atomic.

For example, let's say we have a thread exactly here in in
acquire_import_lock():
        PyThread_acquire_lock(lock->lock, 1);
        /* thread inside PyEval_RestoreThread(), waiting for the GIL */
        PyEval_RestoreThread(saved);
        lock->waiters--;
    }
    assert(lock->level == 0);
    lock->tstate = tstate;

It owns the lock, but hasn't yet updated the lock's owner
(lock->tstate), so another thread calling detect_circularity() will
think that this lock is available, and will proceed, which can
eventually lead to a deadlock.

Also, I think that locks will use POSIX semaphores on systems that
support only a limited number of them (such as FreeBSD 7), and this
might fail in case of nested imports (the infamous ENFILE). I'd have
to double check this, though.
msg150332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-29 15:49
> It owns the lock, but hasn't yet updated the lock's owner
> (lock->tstate), so another thread calling detect_circularity() will
> think that this lock is available, and will proceed, which can
> eventually lead to a deadlock.

That's true. Do you think temptatively acquiring the lock (without
blocking) would solve the issue?

> Also, I think that locks will use POSIX semaphores on systems that
> support only a limited number of them (such as FreeBSD 7), and this
> might fail in case of nested imports (the infamous ENFILE). I'd have
> to double check this, though.

Isn't this limit only about named semaphores? Or does it apply to
anonymous semaphores as well?
msg150371 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-30 10:55
> That's true. Do you think temptatively acquiring the lock (without
> blocking) would solve the issue?

I think it should work. Something along those lines:

while True:
    if lock.acquire(0):
        lock.tstate = tstate
        return True
    else:
        if detect_circularity():
            return False
        global_lock.release()
        saved = save_tstate()
        yield()
        restore_tstate(saved)
        global_lock.acquire()

However, I find this whole mechanism somewhat complicated, so the
question really is: what are we trying to solve?
If we just wan't to avoid deadlocks, a trylock with the global import
lock will do the trick.
If, on the other hand, we really want to reduce the number of cases
where a deadlock would occur by increasing the locking granularity,
then it's the way to go. But I'm not sure it's worth the extra
complexity (increasing the locking granularity is usually a proven
recipe to introduce deadlocks).

> Isn't this limit only about named semaphores? Or does it apply to
> anonymous semaphores as well?

I'm no FreeBSD expert, but AFAICT, POSIX SEM_NSEMS_MAX limit doesn't
seem to make a distinction between named and anonymous semaphores.
From POSIX sem_init() man page:
"""
[ENOSPC]
A resource required to initialise the semaphore has been exhausted, or
the limit on semaphores (SEM_NSEMS_MAX) has been reached.
"""

Also, a quick search returned those links:
http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
So it seems that sem_init() can fail when the max number of semaphores
is reached.
msg150377 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-30 11:42
> If, on the other hand, we really want to reduce the number of cases
> where a deadlock would occur by increasing the locking granularity,
> then it's the way to go.

Yes, that's the point. Today you have to be careful when mixing imports
and threads. The problems is that imports can be implicit, inside a
library call for example (and putting imports inside functions is a way
of avoiding circular imports, or can also allow to reduce startup time).
Some C functions try to circumvent the problem by calling
PyImport_ModuleNoBlock, which is ugly and fragile in its own way (it
will fail if the import lock is taken, even if there wouldn't be a
deadlock: a very primitive kind of deadlock avoidance indeed).

> Also, a quick search returned those links:
> http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
> http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
> So it seems that sem_init() can fail when the max number of semaphores
> is reached.

As they say, "Kritirii choice of the number 30 in the XXI century is
unclear." :-)

File objects also have a per-object lock, so I guess opening 30 files
under FreeBSD would fail. Perhaps we need to fallback on the other lock
implementation on certain platforms?

(our FreeBSD 8.2 buildbot looks pretty much stable, was the number of
semaphores tweaked on it?)
msg150385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-30 18:08
I believe this new patch should be much more robust than the previous one.
It also adds a test for the improvement (it freezes an unpatched interpreter).
msg159540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-28 20:57
Ok, here is a draft patch for the new importlib.
Several issues with this patch:
- introduces a pure Python function (_lock_unlock_module) on the fast import path
- synchronization issues due to interruptibility of pure Python code (see _ModuleLock.acquire)
- afterfork fix-up necessary
- relies on _thread.RLock for bootstrapping reasons
- module locks are immortal
msg159548 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-28 23:16
New patch gets rid of the reliance on _thread.RLock (uses non-recursive locks instead), and should solve the synchronization issue. Other issues remain.
msg159557 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-29 00:54
Updated patch fixes the performance issue and disposes of module locks when they aren't used anymore.
Only the afterfork question remains. Should I hook in threading's own facility? Should we wait for an atfork module? Something else.
msg160014 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 18:08
Updated patch also makes PyImport_ImportModuleNoBlock a simple alias of PyImport_ImportModule.
msg160021 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 19:03
Updated patch also adds unit tests for the module locks and the deadlock avoidance algorithm.
msg160023 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 19:16
Updated patch with a couple new tests.
msg160024 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-05 19:19
I still wonder whether Graham Dumpleton's observation has merits.

Suppose we have these modules

# a.py
time.sleep(10)
import b

# b.py
time.sleep(10)
import a

# main.py
def x():
  import a
def y():
  import b

Now, if x and y are executed in separate threads - won't it deadlock?
msg160026 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 19:21
> Now, if x and y are executed in separate threads - won't it deadlock?

Well, the patch has a deadlock avoidance mechanism, and it includes unit
tests for precisely this situation.
I cannot promise the algorithm is perfect (although there *are* a bunch
of tests), but it looks correct from here. :)
msg160029 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-05 19:33
Can you please elaborate in the patch what the deadlock avoidance does? AFAICT, the comment explains that it is able to detect deadlocks, but nowhere says what it does when it has detected a deadlock.

Also, please submit patches against default's head, or stop using git-style diffs, to enable Rietveld review.
msg160034 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 19:50
Updated patch against tip, and with a comment of what deadlock avoidance does (in _ModuleLock.acquire's docstring).
msg160037 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-05 20:12
The patch parser of Rietveld actually choked on the git binary diff. It now skips over these chunks.
msg160205 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-08 13:39
Updated patch against tip. I also changed the internal API of module locks a bit (acquire() raises _DeadlockError instead of returning False, and deadlock detection is not optional anymore).
msg160354 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-10 16:01
I had forgotten to tackle threadless builds, this patch fixes it.
msg160570 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-13 20:31
Does anyone else want to review this patch?
msg160577 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-05-13 20:52
I don't feel the need to, but I can in a few days if you want me to (just let me know if you do).
msg160983 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-17 17:03
New changeset edb9ce3a6c2e by Antoine Pitrou in branch 'default':
Issue #9260: A finer-grained import lock.
http://hg.python.org/cpython/rev/edb9ce3a6c2e
msg160984 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-17 17:10
I have now pushed the patch.
History
Date User Action Args
2012-05-18 14:53:42pitrousetstatus: pending -> closed
2012-05-18 11:56:27pitroulinkissue8098 superseder
2012-05-17 17:10:20pitrousetstatus: open -> pending
2012-05-17 17:10:11pitrousetstatus: pending -> open
stage: patch review -> resolved
2012-05-17 17:10:06pitrousetstatus: open -> pending
resolution: fixed
messages: + msg160984
2012-05-17 17:07:06Arfreversetnosy: + Arfrever
2012-05-17 17:03:05python-devsetnosy: + python-dev
messages: + msg160983
2012-05-13 20:52:53brett.cannonsetmessages: + msg160577
2012-05-13 20:31:51pitrousetmessages: + msg160570
2012-05-10 16:02:06pitrousetfiles: + module_locks9.patch

messages: + msg160354
2012-05-08 13:39:29pitrousetfiles: + module_locks8.patch

messages: + msg160205
2012-05-08 13:21:29asvetlovsetnosy: + asvetlov
2012-05-05 20:12:45loewissetmessages: + msg160037
2012-05-05 19:50:31pitrousetfiles: + module_locks7.patch

messages: + msg160034
2012-05-05 19:33:33loewissetmessages: + msg160029
2012-05-05 19:21:00pitrousetmessages: + msg160026
2012-05-05 19:19:12loewissetnosy: + loewis
messages: + msg160024
2012-05-05 19:16:09pitrousetfiles: + module_locks6.patch

messages: + msg160023
stage: patch review
2012-05-05 19:03:32pitrousetfiles: + module_locks5.patch

messages: + msg160021
2012-05-05 18:36:06eric.snowsetnosy: + eric.snow
2012-05-05 18:21:07pitrousetfiles: + module_locks4.patch
2012-05-05 18:20:57pitrousetfiles: - module_locks4.patch
2012-05-05 18:08:40pitrousetfiles: + module_locks4.patch

messages: + msg160014
2012-04-29 01:00:06pitrousetfiles: + module_locks3.patch
2012-04-29 00:57:35pitrousetfiles: - module_locks3.patch
2012-04-29 00:55:10pitrousetfiles: + module_locks3.patch

messages: + msg159557
2012-04-28 23:17:01pitrousetfiles: + module_locks2.patch

messages: + msg159548
2012-04-28 20:57:39pitrousetfiles: + module_locks.patch

messages: + msg159540
2011-12-30 18:08:03pitrousetfiles: + implock5.patch

messages: + msg150385
2011-12-30 11:42:31pitrousetmessages: + msg150377
2011-12-30 10:55:58neologixsetmessages: + msg150371
2011-12-29 15:49:35pitrousetmessages: + msg150332
2011-12-29 13:45:48neologixsetmessages: + msg150330
2011-12-28 23:18:01pitrousetfiles: + implock3.patch
versions: + Python 3.3, - Python 3.2
nosy: + neologix

messages: + msg150322
2011-01-20 13:46:11hayposetnosy: + haypo
2010-07-15 16:14:14brett.cannonsetmessages: + msg110379
2010-07-15 11:32:46pitrousetmessages: + msg110357
2010-07-15 10:49:06pitrousetmessages: + msg110356
2010-07-14 23:40:13brett.cannonsetmessages: + msg110340
2010-07-14 22:36:04grahamdsetmessages: + msg110336
2010-07-14 22:23:57belopolskysetfiles: - unnamed
2010-07-14 21:47:29pitrousetmessages: + msg110332
2010-07-14 21:29:03brett.cannonsetmessages: + msg110331
2010-07-14 20:03:38pitrousetmessages: + msg110324
2010-07-14 19:59:36brett.cannonsetfiles: + unnamed

messages: + msg110323
2010-07-14 19:34:33pitrousetmessages: + msg110318
2010-07-14 19:12:30brett.cannonsetmessages: + msg110316
2010-07-14 15:04:48belopolskysetnosy: + belopolsky
2010-07-14 14:48:55pitrousetfiles: + implock.patch
2010-07-14 14:47:00pitrousetfiles: - implock.patch
2010-07-14 14:44:20pitroucreate