msg110287 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-14 14:44 |
This is an implementation of the idea suggested in:
http://mail.python.org/pipermail/python-dev/2003-February/033445.html
The patch creates a dictionary of reentrant locks keyed by module full name. Trying to import a module or package will first get the lock for that module (or, if necessary, create it) and then acquire it. This is done for any module type.
The global import lock is still there, but only used for two things:
- serializing first time creation of module-specific locks
- protection of imports based on import hooks, since we don't know whether third-party import hooks are themselves thread-safe
Semantics of the public C API are unchanged, because it is not clear whether they should be or not (concerns of usefulness vs. compatibility). For example, PyImport_ImportModuleNoBlock() still uses the global import lock but this could be relaxed in a later patch.
|
msg110316 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2010-07-14 19:12 |
So I say we don't worry about loaders being thread-safe. If __import__ handles the locking for a specific module then it will hold the lock on behalf of the loader. Now if someone decides to call load_module on their own, that's there business, but they should be aware of what could happen if they do that without acquiring the lock themselves. Otherwise we just make sure to provide a context manager that takes the name of the module and people can use that when they make their call to loader.load_module.
|
msg110318 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-14 19:34 |
> So I say we don't worry about loaders being thread-safe. If __import__
> handles the locking for a specific module then it will hold the lock
> on behalf of the loader.
Yes but what happens if two different modules are imported from two
different threads, and handled by the same loader? The loader could have
global structures which rely on serialization of imports for
consistency.
|
msg110323 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2010-07-14 19:59 |
On Wed, Jul 14, 2010 at 12:34, Antoine Pitrou <report@bugs.python.org>wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> > So I say we don't worry about loaders being thread-safe. If __import__
> > handles the locking for a specific module then it will hold the lock
> > on behalf of the loader.
>
> Yes but what happens if two different modules are imported from two
> different threads, and handled by the same loader? The loader could have
> global structures which rely on serialization of imports for
> consistency.
>
That's why I said we should supply a context decorator (or function) which
will handle the lock appropriately, taking the name of the module to import
as an argument so the locking is fine-grained.
|
msg110324 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-14 20:03 |
> That's why I said we should supply a context decorator (or function) which
> will handle the lock appropriately, taking the name of the module to import
> as an argument so the locking is fine-grained.
Ok, so what are you saying is that we can break compatibility for non
thread-safe import loaders which currently work fine?
(I have nothing against it, just trying to be sure we agree on the
implications)
|
msg110331 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2010-07-14 21:29 |
What I'm saying is that loaders are quite possibly not thread-safe already, so we don't need to do any special for them. If you look at PEP 302 you will notice not a single mention of loaders needing to care about the import lock because there is no mention of the import lock! So changing the locking mechanism most likely won't break loaders because they are not using the current import lock anyway and so already have their own issues.
As long as __import__ does the proper locking on behalf of loaders and we provide a way for people to use the lock if they want to then I am not worried about the impact on loaders. For example, this will change the logic in importlib where the current import lock is grabbed, but otherwise won't change a thing in terms of the code for the various loaders it implements.
|
msg110332 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-14 21:47 |
> So changing the locking mechanism most likely won't break loaders
> because they are not using the current import lock anyway and so
> already have their own issues.
Are you sure they aren't using it implicitly?
In vanilla py3k, PyImport_ImportModuleLevel() takes the import lock
therefore it protects any inner code, including the various hooks.
|
msg110336 - (view) |
Author: Graham Dumpleton (grahamd) |
Date: 2010-07-14 22:36 |
How is this going to deal with cyclical imports where different threads could import at the same time different modules within that cycle? I need to look through the proposed patch and work out exactly what it does, but am concerned about whether this approach would cause the classic deadlock problem if not done right?
FWIW, this concept of a lock per module is what I used in the mod_python module importer when it was rewritten. I would have to go look back over that code and see how the way the concept is being implemented differs, but there was one remaining potential race condition in the mod_python code which could in rare instances cause a problem. I never did get around to fixing it. Anyway, what I did learn was that this approach isn't necessarily as simple as it may seem so it will need some really good analysis on whatever solution is developed to ensure subtle problems don't come up.
|
msg110340 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2010-07-14 23:40 |
That's my point; loaders are using the lock implicitly so that's why we don't need to worry about the global import lock just for path hooks. It seems like you are suggesting using the global import lock purely for compatibility, and what I am saying is that loaders don't care so there is no compatibility to care about. I say only use the global import lock for serializing creation.
|
msg110356 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-15 10:49 |
> Graham Dumpleton <Graham.Dumpleton@gmail.com> added the comment:
>
> How is this going to deal with cyclical imports where different
> threads could import at the same time different modules within that
> cycle? I need to look through the proposed patch and work out exactly
> what it does, but am concerned about whether this approach would cause
> the classic deadlock problem if not done right?
You're right, I hadn't thought about that. Additional machinery will be
needed to detect potential deadlocks (and break them).
|
msg110357 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-07-15 11:32 |
> That's my point; loaders are using the lock implicitly so that's why
> we don't need to worry about the global import lock just for path
> hooks. It seems like you are suggesting using the global import lock
> purely for compatibility, and what I am saying is that loaders don't
> care so there is no compatibility to care about. I say only use the
> global import lock for serializing creation.
What is your take on the threadimp2.patch in issue9251?
|
msg110379 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2010-07-15 16:14 |
I'll have a look when I can (hopefully EuroPython).
|
msg150322 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-12-28 23:17 |
New prototype with per-module import locks and deadlock avoidance.
When a deadlock due to threaded circular imports is detected, the offending import returns the partially constructed module object (as would happen in single-threaded mode).
Probably lacks a test and some cleanup.
|
msg150330 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-12-29 13:45 |
IIUC, the deadlock avoidance code just checks that acquiring a
per-module lock won't create a cycle.
However, I think there's a race, because the cycle detection and the
lock acquisition is not atomic.
For example, let's say we have a thread exactly here in in
acquire_import_lock():
PyThread_acquire_lock(lock->lock, 1);
/* thread inside PyEval_RestoreThread(), waiting for the GIL */
PyEval_RestoreThread(saved);
lock->waiters--;
}
assert(lock->level == 0);
lock->tstate = tstate;
It owns the lock, but hasn't yet updated the lock's owner
(lock->tstate), so another thread calling detect_circularity() will
think that this lock is available, and will proceed, which can
eventually lead to a deadlock.
Also, I think that locks will use POSIX semaphores on systems that
support only a limited number of them (such as FreeBSD 7), and this
might fail in case of nested imports (the infamous ENFILE). I'd have
to double check this, though.
|
msg150332 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-12-29 15:49 |
> It owns the lock, but hasn't yet updated the lock's owner
> (lock->tstate), so another thread calling detect_circularity() will
> think that this lock is available, and will proceed, which can
> eventually lead to a deadlock.
That's true. Do you think temptatively acquiring the lock (without
blocking) would solve the issue?
> Also, I think that locks will use POSIX semaphores on systems that
> support only a limited number of them (such as FreeBSD 7), and this
> might fail in case of nested imports (the infamous ENFILE). I'd have
> to double check this, though.
Isn't this limit only about named semaphores? Or does it apply to
anonymous semaphores as well?
|
msg150371 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-12-30 10:55 |
> That's true. Do you think temptatively acquiring the lock (without
> blocking) would solve the issue?
I think it should work. Something along those lines:
while True:
if lock.acquire(0):
lock.tstate = tstate
return True
else:
if detect_circularity():
return False
global_lock.release()
saved = save_tstate()
yield()
restore_tstate(saved)
global_lock.acquire()
However, I find this whole mechanism somewhat complicated, so the
question really is: what are we trying to solve?
If we just wan't to avoid deadlocks, a trylock with the global import
lock will do the trick.
If, on the other hand, we really want to reduce the number of cases
where a deadlock would occur by increasing the locking granularity,
then it's the way to go. But I'm not sure it's worth the extra
complexity (increasing the locking granularity is usually a proven
recipe to introduce deadlocks).
> Isn't this limit only about named semaphores? Or does it apply to
> anonymous semaphores as well?
I'm no FreeBSD expert, but AFAICT, POSIX SEM_NSEMS_MAX limit doesn't
seem to make a distinction between named and anonymous semaphores.
From POSIX sem_init() man page:
"""
[ENOSPC]
A resource required to initialise the semaphore has been exhausted, or
the limit on semaphores (SEM_NSEMS_MAX) has been reached.
"""
Also, a quick search returned those links:
http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
So it seems that sem_init() can fail when the max number of semaphores
is reached.
|
msg150377 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-12-30 11:42 |
> If, on the other hand, we really want to reduce the number of cases
> where a deadlock would occur by increasing the locking granularity,
> then it's the way to go.
Yes, that's the point. Today you have to be careful when mixing imports
and threads. The problems is that imports can be implicit, inside a
library call for example (and putting imports inside functions is a way
of avoiding circular imports, or can also allow to reduce startup time).
Some C functions try to circumvent the problem by calling
PyImport_ModuleNoBlock, which is ugly and fragile in its own way (it
will fail if the import lock is taken, even if there wouldn't be a
deadlock: a very primitive kind of deadlock avoidance indeed).
> Also, a quick search returned those links:
> http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
> http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
> So it seems that sem_init() can fail when the max number of semaphores
> is reached.
As they say, "Kritirii choice of the number 30 in the XXI century is
unclear." :-)
File objects also have a per-object lock, so I guess opening 30 files
under FreeBSD would fail. Perhaps we need to fallback on the other lock
implementation on certain platforms?
(our FreeBSD 8.2 buildbot looks pretty much stable, was the number of
semaphores tweaked on it?)
|
msg150385 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-12-30 18:08 |
I believe this new patch should be much more robust than the previous one.
It also adds a test for the improvement (it freezes an unpatched interpreter).
|
msg159540 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-28 20:57 |
Ok, here is a draft patch for the new importlib.
Several issues with this patch:
- introduces a pure Python function (_lock_unlock_module) on the fast import path
- synchronization issues due to interruptibility of pure Python code (see _ModuleLock.acquire)
- afterfork fix-up necessary
- relies on _thread.RLock for bootstrapping reasons
- module locks are immortal
|
msg159548 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-28 23:16 |
New patch gets rid of the reliance on _thread.RLock (uses non-recursive locks instead), and should solve the synchronization issue. Other issues remain.
|
msg159557 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-04-29 00:54 |
Updated patch fixes the performance issue and disposes of module locks when they aren't used anymore.
Only the afterfork question remains. Should I hook in threading's own facility? Should we wait for an atfork module? Something else.
|
msg160014 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-05 18:08 |
Updated patch also makes PyImport_ImportModuleNoBlock a simple alias of PyImport_ImportModule.
|
msg160021 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-05 19:03 |
Updated patch also adds unit tests for the module locks and the deadlock avoidance algorithm.
|
msg160023 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-05 19:16 |
Updated patch with a couple new tests.
|
msg160024 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-05 19:19 |
I still wonder whether Graham Dumpleton's observation has merits.
Suppose we have these modules
# a.py
time.sleep(10)
import b
# b.py
time.sleep(10)
import a
# main.py
def x():
import a
def y():
import b
Now, if x and y are executed in separate threads - won't it deadlock?
|
msg160026 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-05 19:21 |
> Now, if x and y are executed in separate threads - won't it deadlock?
Well, the patch has a deadlock avoidance mechanism, and it includes unit
tests for precisely this situation.
I cannot promise the algorithm is perfect (although there *are* a bunch
of tests), but it looks correct from here. :)
|
msg160029 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-05 19:33 |
Can you please elaborate in the patch what the deadlock avoidance does? AFAICT, the comment explains that it is able to detect deadlocks, but nowhere says what it does when it has detected a deadlock.
Also, please submit patches against default's head, or stop using git-style diffs, to enable Rietveld review.
|
msg160034 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-05 19:50 |
Updated patch against tip, and with a comment of what deadlock avoidance does (in _ModuleLock.acquire's docstring).
|
msg160037 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-05 20:12 |
The patch parser of Rietveld actually choked on the git binary diff. It now skips over these chunks.
|
msg160205 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-08 13:39 |
Updated patch against tip. I also changed the internal API of module locks a bit (acquire() raises _DeadlockError instead of returning False, and deadlock detection is not optional anymore).
|
msg160354 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-10 16:01 |
I had forgotten to tackle threadless builds, this patch fixes it.
|
msg160570 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-13 20:31 |
Does anyone else want to review this patch?
|
msg160577 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2012-05-13 20:52 |
I don't feel the need to, but I can in a few days if you want me to (just let me know if you do).
|
msg160983 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-05-17 17:03 |
New changeset edb9ce3a6c2e by Antoine Pitrou in branch 'default':
Issue #9260: A finer-grained import lock.
http://hg.python.org/cpython/rev/edb9ce3a6c2e
|
msg160984 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-17 17:10 |
I have now pushed the patch.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:03 | admin | set | github: 53506 |
2012-05-18 14:53:42 | pitrou | set | status: pending -> closed |
2012-05-18 11:56:27 | pitrou | link | issue8098 superseder |
2012-05-17 17:10:20 | pitrou | set | status: open -> pending |
2012-05-17 17:10:11 | pitrou | set | status: pending -> open stage: patch review -> resolved |
2012-05-17 17:10:06 | pitrou | set | status: open -> pending resolution: fixed messages:
+ msg160984
|
2012-05-17 17:07:06 | Arfrever | set | nosy:
+ Arfrever
|
2012-05-17 17:03:05 | python-dev | set | nosy:
+ python-dev messages:
+ msg160983
|
2012-05-13 20:52:53 | brett.cannon | set | messages:
+ msg160577 |
2012-05-13 20:31:51 | pitrou | set | messages:
+ msg160570 |
2012-05-10 16:02:06 | pitrou | set | files:
+ module_locks9.patch
messages:
+ msg160354 |
2012-05-08 13:39:29 | pitrou | set | files:
+ module_locks8.patch
messages:
+ msg160205 |
2012-05-08 13:21:29 | asvetlov | set | nosy:
+ asvetlov
|
2012-05-05 20:12:45 | loewis | set | messages:
+ msg160037 |
2012-05-05 19:50:31 | pitrou | set | files:
+ module_locks7.patch
messages:
+ msg160034 |
2012-05-05 19:33:33 | loewis | set | messages:
+ msg160029 |
2012-05-05 19:21:00 | pitrou | set | messages:
+ msg160026 |
2012-05-05 19:19:12 | loewis | set | nosy:
+ loewis messages:
+ msg160024
|
2012-05-05 19:16:09 | pitrou | set | files:
+ module_locks6.patch
messages:
+ msg160023 stage: patch review |
2012-05-05 19:03:32 | pitrou | set | files:
+ module_locks5.patch
messages:
+ msg160021 |
2012-05-05 18:36:06 | eric.snow | set | nosy:
+ eric.snow
|
2012-05-05 18:21:07 | pitrou | set | files:
+ module_locks4.patch |
2012-05-05 18:20:57 | pitrou | set | files:
- module_locks4.patch |
2012-05-05 18:08:40 | pitrou | set | files:
+ module_locks4.patch
messages:
+ msg160014 |
2012-04-29 01:00:06 | pitrou | set | files:
+ module_locks3.patch |
2012-04-29 00:57:35 | pitrou | set | files:
- module_locks3.patch |
2012-04-29 00:55:10 | pitrou | set | files:
+ module_locks3.patch
messages:
+ msg159557 |
2012-04-28 23:17:01 | pitrou | set | files:
+ module_locks2.patch
messages:
+ msg159548 |
2012-04-28 20:57:39 | pitrou | set | files:
+ module_locks.patch
messages:
+ msg159540 |
2011-12-30 18:08:03 | pitrou | set | files:
+ implock5.patch
messages:
+ msg150385 |
2011-12-30 11:42:31 | pitrou | set | messages:
+ msg150377 |
2011-12-30 10:55:58 | neologix | set | messages:
+ msg150371 |
2011-12-29 15:49:35 | pitrou | set | messages:
+ msg150332 |
2011-12-29 13:45:48 | neologix | set | messages:
+ msg150330 |
2011-12-28 23:18:01 | pitrou | set | files:
+ implock3.patch versions:
+ Python 3.3, - Python 3.2 nosy:
+ neologix
messages:
+ msg150322
|
2011-01-20 13:46:11 | vstinner | set | nosy:
+ vstinner
|
2010-07-15 16:14:14 | brett.cannon | set | messages:
+ msg110379 |
2010-07-15 11:32:46 | pitrou | set | messages:
+ msg110357 |
2010-07-15 10:49:06 | pitrou | set | messages:
+ msg110356 |
2010-07-14 23:40:13 | brett.cannon | set | messages:
+ msg110340 |
2010-07-14 22:36:04 | grahamd | set | messages:
+ msg110336 |
2010-07-14 22:23:57 | belopolsky | set | files:
- unnamed |
2010-07-14 21:47:29 | pitrou | set | messages:
+ msg110332 |
2010-07-14 21:29:03 | brett.cannon | set | messages:
+ msg110331 |
2010-07-14 20:03:38 | pitrou | set | messages:
+ msg110324 |
2010-07-14 19:59:36 | brett.cannon | set | files:
+ unnamed
messages:
+ msg110323 |
2010-07-14 19:34:33 | pitrou | set | messages:
+ msg110318 |
2010-07-14 19:12:30 | brett.cannon | set | messages:
+ msg110316 |
2010-07-14 15:04:48 | belopolsky | set | nosy:
+ belopolsky
|
2010-07-14 14:48:55 | pitrou | set | files:
+ implock.patch |
2010-07-14 14:47:00 | pitrou | set | files:
- implock.patch |
2010-07-14 14:44:20 | pitrou | create | |