Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A finer grained import lock #53506

Closed
pitrou opened this issue Jul 14, 2010 · 35 comments
Closed

A finer grained import lock #53506

pitrou opened this issue Jul 14, 2010 · 35 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@pitrou
Copy link
Member

pitrou commented Jul 14, 2010

BPO 9260
Nosy @gvanrossum, @loewis, @brettcannon, @ncoghlan, @abalkin, @pitrou, @vstinner, @tiran, @asvetlov, @ericsnowcurrently
Files
  • implock.patch
  • implock3.patch
  • implock5.patch
  • module_locks.patch
  • module_locks2.patch
  • module_locks3.patch
  • module_locks4.patch
  • module_locks5.patch
  • module_locks6.patch
  • module_locks7.patch
  • module_locks8.patch
  • module_locks9.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-18.14:53:42.214>
    created_at = <Date 2010-07-14.14:44:20.143>
    labels = ['interpreter-core', 'type-feature']
    title = 'A finer grained import lock'
    updated_at = <Date 2012-05-18.14:53:42.214>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2012-05-18.14:53:42.214>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-18.14:53:42.214>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2010-07-14.14:44:20.143>
    creator = 'pitrou'
    dependencies = []
    files = ['17999', '24101', '24114', '25392', '25394', '25399', '25469', '25470', '25471', '25475', '25496', '25521']
    hgrepos = []
    issue_num = 9260
    keywords = ['patch']
    message_count = 35.0
    messages = ['110287', '110316', '110318', '110323', '110324', '110331', '110332', '110336', '110340', '110356', '110357', '110379', '150322', '150330', '150332', '150371', '150377', '150385', '159540', '159548', '159557', '160014', '160021', '160023', '160024', '160026', '160029', '160034', '160037', '160205', '160354', '160570', '160577', '160983', '160984']
    nosy_count = 14.0
    nosy_names = ['gvanrossum', 'loewis', 'brett.cannon', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'christian.heimes', 'grahamd', 'Arfrever', 'asvetlov', 'neologix', 'python-dev', 'eric.snow']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9260'
    versions = ['Python 3.3']

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 14, 2010

    This is an implementation of the idea suggested in:
    http://mail.python.org/pipermail/python-dev/2003-February/033445.html

    The patch creates a dictionary of reentrant locks keyed by module full name. Trying to import a module or package will first get the lock for that module (or, if necessary, create it) and then acquire it. This is done for any module type.

    The global import lock is still there, but only used for two things:

    • serializing first time creation of module-specific locks
    • protection of imports based on import hooks, since we don't know whether third-party import hooks are themselves thread-safe

    Semantics of the public C API are unchanged, because it is not clear whether they should be or not (concerns of usefulness vs. compatibility). For example, PyImport_ImportModuleNoBlock() still uses the global import lock but this could be relaxed in a later patch.

    @pitrou pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jul 14, 2010
    @brettcannon
    Copy link
    Member

    So I say we don't worry about loaders being thread-safe. If __import__ handles the locking for a specific module then it will hold the lock on behalf of the loader. Now if someone decides to call load_module on their own, that's there business, but they should be aware of what could happen if they do that without acquiring the lock themselves. Otherwise we just make sure to provide a context manager that takes the name of the module and people can use that when they make their call to loader.load_module.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 14, 2010

    So I say we don't worry about loaders being thread-safe. If __import__
    handles the locking for a specific module then it will hold the lock
    on behalf of the loader.

    Yes but what happens if two different modules are imported from two
    different threads, and handled by the same loader? The loader could have
    global structures which rely on serialization of imports for
    consistency.

    @brettcannon
    Copy link
    Member

    On Wed, Jul 14, 2010 at 12:34, Antoine Pitrou <report@bugs.python.org>wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > So I say we don't worry about loaders being thread-safe. If __import__
    > handles the locking for a specific module then it will hold the lock
    > on behalf of the loader.

    Yes but what happens if two different modules are imported from two
    different threads, and handled by the same loader? The loader could have
    global structures which rely on serialization of imports for
    consistency.

    That's why I said we should supply a context decorator (or function) which
    will handle the lock appropriately, taking the name of the module to import
    as an argument so the locking is fine-grained.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 14, 2010

    That's why I said we should supply a context decorator (or function) which
    will handle the lock appropriately, taking the name of the module to import
    as an argument so the locking is fine-grained.

    Ok, so what are you saying is that we can break compatibility for non
    thread-safe import loaders which currently work fine?
    (I have nothing against it, just trying to be sure we agree on the
    implications)

    @brettcannon
    Copy link
    Member

    What I'm saying is that loaders are quite possibly not thread-safe already, so we don't need to do any special for them. If you look at PEP-302 you will notice not a single mention of loaders needing to care about the import lock because there is no mention of the import lock! So changing the locking mechanism most likely won't break loaders because they are not using the current import lock anyway and so already have their own issues.

    As long as __import__ does the proper locking on behalf of loaders and we provide a way for people to use the lock if they want to then I am not worried about the impact on loaders. For example, this will change the logic in importlib where the current import lock is grabbed, but otherwise won't change a thing in terms of the code for the various loaders it implements.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 14, 2010

    So changing the locking mechanism most likely won't break loaders
    because they are not using the current import lock anyway and so
    already have their own issues.

    Are you sure they aren't using it implicitly?
    In vanilla py3k, PyImport_ImportModuleLevel() takes the import lock
    therefore it protects any inner code, including the various hooks.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jul 14, 2010

    How is this going to deal with cyclical imports where different threads could import at the same time different modules within that cycle? I need to look through the proposed patch and work out exactly what it does, but am concerned about whether this approach would cause the classic deadlock problem if not done right?

    FWIW, this concept of a lock per module is what I used in the mod_python module importer when it was rewritten. I would have to go look back over that code and see how the way the concept is being implemented differs, but there was one remaining potential race condition in the mod_python code which could in rare instances cause a problem. I never did get around to fixing it. Anyway, what I did learn was that this approach isn't necessarily as simple as it may seem so it will need some really good analysis on whatever solution is developed to ensure subtle problems don't come up.

    @brettcannon
    Copy link
    Member

    That's my point; loaders are using the lock implicitly so that's why we don't need to worry about the global import lock just for path hooks. It seems like you are suggesting using the global import lock purely for compatibility, and what I am saying is that loaders don't care so there is no compatibility to care about. I say only use the global import lock for serializing creation.

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 15, 2010

    Graham Dumpleton <Graham.Dumpleton@gmail.com> added the comment:

    How is this going to deal with cyclical imports where different
    threads could import at the same time different modules within that
    cycle? I need to look through the proposed patch and work out exactly
    what it does, but am concerned about whether this approach would cause
    the classic deadlock problem if not done right?

    You're right, I hadn't thought about that. Additional machinery will be
    needed to detect potential deadlocks (and break them).

    @pitrou
    Copy link
    Member Author

    pitrou commented Jul 15, 2010

    That's my point; loaders are using the lock implicitly so that's why
    we don't need to worry about the global import lock just for path
    hooks. It seems like you are suggesting using the global import lock
    purely for compatibility, and what I am saying is that loaders don't
    care so there is no compatibility to care about. I say only use the
    global import lock for serializing creation.

    What is your take on the threadimp2.patch in bpo-9251?

    @brettcannon
    Copy link
    Member

    I'll have a look when I can (hopefully EuroPython).

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 28, 2011

    New prototype with per-module import locks and deadlock avoidance.
    When a deadlock due to threaded circular imports is detected, the offending import returns the partially constructed module object (as would happen in single-threaded mode).

    Probably lacks a test and some cleanup.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Dec 29, 2011

    IIUC, the deadlock avoidance code just checks that acquiring a
    per-module lock won't create a cycle.
    However, I think there's a race, because the cycle detection and the
    lock acquisition is not atomic.

    For example, let's say we have a thread exactly here in in
    acquire_import_lock():
            PyThread_acquire_lock(lock->lock, 1);
            /* thread inside PyEval_RestoreThread(), waiting for the GIL */
            PyEval_RestoreThread(saved);
            lock->waiters--;
        }
        assert(lock->level == 0);
        lock->tstate = tstate;

    It owns the lock, but hasn't yet updated the lock's owner
    (lock->tstate), so another thread calling detect_circularity() will
    think that this lock is available, and will proceed, which can
    eventually lead to a deadlock.

    Also, I think that locks will use POSIX semaphores on systems that
    support only a limited number of them (such as FreeBSD 7), and this
    might fail in case of nested imports (the infamous ENFILE). I'd have
    to double check this, though.

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 29, 2011

    It owns the lock, but hasn't yet updated the lock's owner
    (lock->tstate), so another thread calling detect_circularity() will
    think that this lock is available, and will proceed, which can
    eventually lead to a deadlock.

    That's true. Do you think temptatively acquiring the lock (without
    blocking) would solve the issue?

    Also, I think that locks will use POSIX semaphores on systems that
    support only a limited number of them (such as FreeBSD 7), and this
    might fail in case of nested imports (the infamous ENFILE). I'd have
    to double check this, though.

    Isn't this limit only about named semaphores? Or does it apply to
    anonymous semaphores as well?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Dec 30, 2011

    That's true. Do you think temptatively acquiring the lock (without
    blocking) would solve the issue?

    I think it should work. Something along those lines:

    while True:
        if lock.acquire(0):
            lock.tstate = tstate
            return True
        else:
            if detect_circularity():
                return False
            global_lock.release()
            saved = save_tstate()
            yield()
            restore_tstate(saved)
            global_lock.acquire()

    However, I find this whole mechanism somewhat complicated, so the
    question really is: what are we trying to solve?
    If we just wan't to avoid deadlocks, a trylock with the global import
    lock will do the trick.
    If, on the other hand, we really want to reduce the number of cases
    where a deadlock would occur by increasing the locking granularity,
    then it's the way to go. But I'm not sure it's worth the extra
    complexity (increasing the locking granularity is usually a proven
    recipe to introduce deadlocks).

    Isn't this limit only about named semaphores? Or does it apply to
    anonymous semaphores as well?

    I'm no FreeBSD expert, but AFAICT, POSIX SEM_NSEMS_MAX limit doesn't
    seem to make a distinction between named and anonymous semaphores.
    From POSIX sem_init() man page:
    """
    [ENOSPC]
    A resource required to initialise the semaphore has been exhausted, or
    the limit on semaphores (SEM_NSEMS_MAX) has been reached.
    """

    Also, a quick search returned those links:
    http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
    http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
    So it seems that sem_init() can fail when the max number of semaphores
    is reached.

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 30, 2011

    If, on the other hand, we really want to reduce the number of cases
    where a deadlock would occur by increasing the locking granularity,
    then it's the way to go.

    Yes, that's the point. Today you have to be careful when mixing imports
    and threads. The problems is that imports can be implicit, inside a
    library call for example (and putting imports inside functions is a way
    of avoiding circular imports, or can also allow to reduce startup time).
    Some C functions try to circumvent the problem by calling
    PyImport_ModuleNoBlock, which is ugly and fragile in its own way (it
    will fail if the import lock is taken, even if there wouldn't be a
    deadlock: a very primitive kind of deadlock avoidance indeed).

    Also, a quick search returned those links:
    http://ftp.es.freebsd.org/pub/FreeBSD/development/FreeBSD-CVS/src/sys/sys/ksem.h,v
    http://translate.google.fr/translate?hl=fr&sl=ru&tl=en&u=http%3A%2F%2Fforum.nginx.org%2Fread.php%3F21%2C202865%2C202865
    So it seems that sem_init() can fail when the max number of semaphores
    is reached.

    As they say, "Kritirii choice of the number 30 in the XXI century is
    unclear." :-)

    File objects also have a per-object lock, so I guess opening 30 files
    under FreeBSD would fail. Perhaps we need to fallback on the other lock
    implementation on certain platforms?

    (our FreeBSD 8.2 buildbot looks pretty much stable, was the number of
    semaphores tweaked on it?)

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 30, 2011

    I believe this new patch should be much more robust than the previous one.
    It also adds a test for the improvement (it freezes an unpatched interpreter).

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 28, 2012

    Ok, here is a draft patch for the new importlib.
    Several issues with this patch:

    • introduces a pure Python function (_lock_unlock_module) on the fast import path
    • synchronization issues due to interruptibility of pure Python code (see _ModuleLock.acquire)
    • afterfork fix-up necessary
    • relies on _thread.RLock for bootstrapping reasons
    • module locks are immortal

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 28, 2012

    New patch gets rid of the reliance on _thread.RLock (uses non-recursive locks instead), and should solve the synchronization issue. Other issues remain.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 29, 2012

    Updated patch fixes the performance issue and disposes of module locks when they aren't used anymore.
    Only the afterfork question remains. Should I hook in threading's own facility? Should we wait for an atfork module? Something else.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 5, 2012

    Updated patch also makes PyImport_ImportModuleNoBlock a simple alias of PyImport_ImportModule.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 5, 2012

    Updated patch also adds unit tests for the module locks and the deadlock avoidance algorithm.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 5, 2012

    Updated patch with a couple new tests.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2012

    I still wonder whether Graham Dumpleton's observation has merits.

    Suppose we have these modules

    # a.py
    time.sleep(10)
    import b
    
    # b.py
    time.sleep(10)
    import a
    
    # main.py
    def x():
      import a
    def y():
      import b

    Now, if x and y are executed in separate threads - won't it deadlock?

    @pitrou
    Copy link
    Member Author

    pitrou commented May 5, 2012

    Now, if x and y are executed in separate threads - won't it deadlock?

    Well, the patch has a deadlock avoidance mechanism, and it includes unit
    tests for precisely this situation.
    I cannot promise the algorithm is perfect (although there *are* a bunch
    of tests), but it looks correct from here. :)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2012

    Can you please elaborate in the patch what the deadlock avoidance does? AFAICT, the comment explains that it is able to detect deadlocks, but nowhere says what it does when it has detected a deadlock.

    Also, please submit patches against default's head, or stop using git-style diffs, to enable Rietveld review.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 5, 2012

    Updated patch against tip, and with a comment of what deadlock avoidance does (in _ModuleLock.acquire's docstring).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2012

    The patch parser of Rietveld actually choked on the git binary diff. It now skips over these chunks.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 8, 2012

    Updated patch against tip. I also changed the internal API of module locks a bit (acquire() raises _DeadlockError instead of returning False, and deadlock detection is not optional anymore).

    @pitrou
    Copy link
    Member Author

    pitrou commented May 10, 2012

    I had forgotten to tackle threadless builds, this patch fixes it.

    @pitrou
    Copy link
    Member Author

    pitrou commented May 13, 2012

    Does anyone else want to review this patch?

    @brettcannon
    Copy link
    Member

    I don't feel the need to, but I can in a few days if you want me to (just let me know if you do).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 17, 2012

    New changeset edb9ce3a6c2e by Antoine Pitrou in branch 'default':
    Issue bpo-9260: A finer-grained import lock.
    http://hg.python.org/cpython/rev/edb9ce3a6c2e

    @pitrou
    Copy link
    Member Author

    pitrou commented May 17, 2012

    I have now pushed the patch.

    @pitrou pitrou closed this as completed May 18, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants