Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locks in the standard library should be sanitized on fork #50970

Closed
gpshead opened this issue Aug 17, 2009 · 134 comments
Closed

Locks in the standard library should be sanitized on fork #50970

gpshead opened this issue Aug 17, 2009 · 134 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gpshead
Copy link
Member

gpshead commented Aug 17, 2009

BPO 6721
Nosy @rhettinger, @gpshead, @vsajip, @jcea, @nirs, @pitrou, @vstinner, @applio, @cagney, @Birne94, @ochedru, @kevans91, @jessefarnham, @rojer, @koubaa
PRs
  • bpo-6721: Sanitize logging locks while forking #4071
  • [3.7] bpo-6721: Hold logging locks across fork() (GH-4071) #9291
  • bpo-1635741 port _curses_panel to multi-phase init (PEP 489) #21986
  • Update all Python Cookbook links after migration to GitHub #22205
  • bpo-40423: Optimization: use close_range(2) if available #22651
  • Files
  • lock_fork_thread_deadlock_demo.py
  • forklocktests.patch
  • reinit_locks.diff: patch adding locks reinitialization upon fork
  • emit_warning_on_fork.patch
  • atfork.patch
  • reinit_locks_2.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-08-17.23:06:17.321>
    labels = ['3.7', 'type-feature', 'library']
    title = 'Locks in the standard library should be sanitized on fork'
    updated_at = <Date 2020-10-11.20:42:06.563>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2020-10-11.20:42:06.563>
    actor = 'kevans'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-08-17.23:06:17.321>
    creator = 'gregory.p.smith'
    dependencies = []
    files = ['14740', '21874', '22005', '22525', '24303', '25776']
    hgrepos = []
    issue_num = 6721
    keywords = ['patch']
    message_count = 133.0
    messages = ['91674', '91936', '92766', '94102', '94115', '94133', '94135', '128282', '128307', '128311', '128316', '128369', '135012', '135067', '135069', '135079', '135083', '135095', '135096', '135143', '135157', '135173', '135543', '135857', '135866', '135897', '135899', '135948', '135965', '135984', '136003', '136039', '136045', '136047', '136120', '136147', '139084', '139245', '139470', '139474', '139480', '139485', '139488', '139489', '139509', '139511', '139521', '139522', '139584', '139599', '139608', '139800', '139808', '139850', '139852', '139858', '139869', '139897', '139929', '140215', '140402', '140550', '140658', '140659', '140668', '140689', '140690', '140691', '141286', '143174', '143274', '143279', '151168', '151266', '151267', '151845', '151846', '151853', '161019', '161029', '161389', '161405', '161470', '161953', '162019', '162031', '162034', '162036', '162038', '162039', '162040', '162041', '162053', '162054', '162063', '162113', '162114', '162115', '162117', '162120', '162137', '162160', '270015', '270017', '270018', '270019', '270020', '270021', '270022', '270023', '270028', '289716', '294726', '294834', '304714', '304716', '304722', '304723', '314983', '325326', '327267', '329474', '339369', '339371', '339393', '339418', '339454', '339458', '339473', '365169', '367528', '367702', '368882']
    nosy_count = 29.0
    nosy_names = ['rhettinger', 'gregory.p.smith', 'vinay.sajip', 'jcea', 'nirs', 'pitrou', 'vstinner', 'nirai', 'forest_atq', 'ionelmc', 'bobbyi', 'neologix', 'Giovanni.Bajo', 'sdaoden', 'tshepang', 'sbt', 'lesha', 'dan.oreilly', 'davin', 'Connor.Wolf', 'Winterflower', 'cagney', 'Birne94', 'ochedru', 'kevans', 'jesse.farnham', 'hugh', 'rojer', 'koubaa']
    pr_nums = ['4071', '9291', '21986', '22205', '22651']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue6721'
    versions = ['Python 3.7']

    @gpshead
    Copy link
    Member Author

    gpshead commented Aug 17, 2009

    The python logging module uses a lock to surround many operations, in
    particular. This causes deadlocks in programs that use logging, fork
    and threading simultaneously.

    1. spawn one or more threads in your program
    2. have at least one of those threads make logging calls that will be
      emitted.
    3. have your main thread or another thread use os.fork() to run some
      python code in a child process.
    4. If the fork happened while one of your threads was within the
      logging.Handler.handle() critical section (or anywhere else where
      handler.lock is acquired), your child process will deadlock as soon as
      it tries to log anything. It inherited a held lock.

    The deadlock is more likely to happen on a highly loaded system which
    tends to widen the deadlock opportunity window due to context switching.

    A demo of the problem simplified into one file is attached.

    The Python standard library should not be the cause of these deadlocks.
    We need a way for all standard library locks to be cleaned up when
    forking. By doing one of the following:

    A) acquire all locks before forking, release them immediately after.
    B) forceably release all standard library locks after forking in the
    child process.

    Code was added to call some cleanups after forking in
    http://bugs.python.org/issue874900 but there are more things that also
    need this same sort of cleanup (logging for example).

    Rather than having to manually add after fork code hooks into every file
    in the standard library that uses locks, a more general solution to
    track and manage locks across fork would be a good idea.

    @gpshead gpshead self-assigned this Aug 17, 2009
    @gpshead
    Copy link
    Member Author

    gpshead commented Aug 24, 2009

    I've started a project to patch this and similar messes up for Python
    2.4 and later here:

    http://code.google.com/p/python-atfork/

    I'd like to take ideas or implementations from that when possible for
    future use in the python standard library.

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 17, 2009

    bpo-6923 has been opened to provide a C API for an atfork mechanism for
    use by extension modules.

    @gpshead gpshead added the stdlib Python modules in the Lib dir label Sep 17, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Oct 15, 2009

    Rather than having a kind of global module registry, locks could keep
    track of what was the last PID, and reinitialize themselves if it changed.
    This is assuming getpid() is fast :-)

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 16, 2009

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Rather than having a kind of global module registry, locks could keep
    track of what was the last PID, and reinitialize themselves if it changed.
    This is assuming getpid() is fast :-)

    Locks can't blindly release themselves because they find themselves
    running in another process.

    If anything if a lock is held and finds itself running in a new
    process any attempt to use the lock should raise an exception so that
    the bug is noticed.

    I'm not sure a PID check is good enough. old linux using linuxthreads
    had a different pid for every thread, current linux with NPTL is more
    like other oses with the same pid for all threads.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 16, 2009

    I was suggesting "reinitialize", rather than "release". That is, create
    a new lock (mutex, semaphore, etc.) and let the old one die (or occupy
    some tiny bit of memory).

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 16, 2009

    no need for that. the problem is that they're held by a thread that
    does not exist in the newly forked child process so they will never be
    released in the new process.

    example: if you fork while another thread is in the middle of logging
    something and then try to log something yourself in the child, your
    child process will deadlock on the logging module's lock.

    locks are not shared between processes so reinitializing them with a new
    object wouldn't do anything.

    @bitdancer bitdancer added the type-bug An unexpected behavior, bug, or error label Dec 14, 2010
    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Feb 10, 2011

    I'm not sure that releasing the mutex is enough, it can still lead to a segfault, as is probably the case in this issue :
    http://bugs.python.org/issue11148

    Quoting pthread_atfork man page :

    To understand the purpose of pthread_atfork, recall that fork duplicates the whole memory space, including mutexes in their current locking state, but only the calling thread: other threads are not running in the child process. The mutexes are not usable after the fork and must be initialized with pthread_mutex_init in the child process. This is a limitation of the current implementation and might or might not be present in future versions.

    To avoid this, install handlers with pthread_atfork as follows: have the prepare handler lock the mutexes (in locking order), and the parent handler unlock the mutexes. The child handler should reset the mutexes using pthread_mutex_init, as well as any other synchronization objects such as condition variables.

    Locking the global mutexes before the fork ensures that all other threads are locked out of the critical regions of code protected by those mutexes. Thus when fork takes a snapshot of the parent's address space, that snapshot will copy valid, stable data. Resetting the synchronization objects in the child process will ensure they are properly cleansed of any artifacts from the threading subsystem of the parent process. For example, a mutex may inherit a wait queue of threads waiting for the lock; this wait queue makes no sense in the child process. Initializing the mutex takes care of this.

    pthread_atfork might be worth looking into

    @gpshead
    Copy link
    Member Author

    gpshead commented Feb 10, 2011

    fwiw http://bugs.python.org/issue6643 recently fixed on issue where a mutex was being closed instead of reinitialized after a fork. more are likely needed.

    Are you suggesting to use pthread_atfork to call pthread_mutex_init on all mutexes created by Python in the child after a fork? I'll have to ponder that some more. Given the mutexes are all useless post fork it does not strike me as a bad idea.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 10, 2011

    Are you suggesting to use pthread_atfork to call pthread_mutex_init on > all mutexes created by Python in the child after a fork? I'll have to > ponder that some more. Given the mutexes are all useless post fork it > does not strike me as a bad idea.

    I don't really understand. It's quite similar to the idea you shot down in msg94135. Or am I missing something?

    @gpshead
    Copy link
    Member Author

    gpshead commented Feb 10, 2011

    Yeah, I'm trying to figure out what I was thinking then or if I was just plain wrong. :)

    I was clearly wrong about a release being done in the child being the right thing to do (bpo-6643 proved that, the state held by a lock is not usable to another process on all platforms such that release even works).

    Part of it looks like I wanted a way to detect it was happening as any lock that is held during a fork indicates a _potential_ bug (the lock wasn't registered anywhere to be released before the fork) but not everything needs to care about that.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Feb 11, 2011

    I was clearly wrong about a release being done in the child being the right thing to do (bpo-6643 proved that, the state held by a lock is not usable to another process on all platforms such that release even works).

    Yeah, apparently OS-X is one of them, the reporter in bpo-11148 is
    experiencing segfaults under OS-X.

    Are you suggesting to use pthread_atfork to call pthread_mutex_init on all mutexes created by Python in the child after a fork? I'll have to ponder that some more. Given the mutexes are all useless post fork it does not strike me as a bad idea.

    Yes, that's what I was thinking. Instead of scattering the
    lock-reclaiming code all over the place, try to use a more standard
    API specifically designed with that in mind.
    Note the base issue is that we're authorizing things which are
    forbidden : in a multi-threaded process, only sync-safe calls are
    authorized between fork and exec.

    2011/2/10 Gregory P. Smith <report@bugs.python.org>:

    Gregory P. Smith <greg@krypto.org> added the comment:

    Yeah, I'm trying to figure out what I was thinking then or if I was just plain wrong. :)

    I was clearly wrong about a release being done in the child being the right thing to do (bpo-6643 proved that, the state held by a lock is not usable to another process on all platforms such that release even works).

    Part of it looks like I wanted a way to detect it was happening as any lock that is held during a fork indicates a _potential_ bug (the lock wasn't registered anywhere to be released before the fork) but not everything needs to care about that.

    ----------
    versions: +Python 3.3


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue6721\>


    @pitrou
    Copy link
    Member

    pitrou commented May 3, 2011

    I encountered this issue while debugging some multiprocessing code; fork() would be called from one thread while sys.stdout was in use in another thread (simply because of a couple of debugging statements). As a result the IO lock would be already "taken" in the child process and any operation on sys.stdout would deadlock.

    This is definitely something that can happen more easily than I thought.

    @pitrou
    Copy link
    Member

    pitrou commented May 3, 2011

    Here is a patch with tests for the issue (some of which fail of course).
    Do we agree that these tests are right?

    @gpshead
    Copy link
    Member Author

    gpshead commented May 3, 2011

    Those tests make sense to me.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 3, 2011

    # A lock taken from the current thread should stay taken in the
    # child process.

    Note that I'm not sure of how to implement this.
    After a fork, even releasing the lock can be unsafe, it must be re-initialized, see following comment in glibc's malloc implementation:
    /* In NPTL, unlocking a mutex in the child process after a
    fork() is currently unsafe, whereas re-initializing it is safe and
    does not leak resources. Therefore, a special atfork handler is
    installed for the child. */

    Note that this means that even the current code allocating new locks after fork (in Lib/threading.py, _after_fork and _reset_internal_locks) is unsafe, because the old locks will be deallocated, and the lock deallocation tries to acquire and release the lock before destroying it (in issue bpo-11148 the OP experienced a segfault on OS-X when locking a mutex, but I'm not sure of the exact context).

    Also, this would imply keeping track of the thread currently owning the lock, and doesn't match the typical pthread_atfork idiom (acquire locks just before fork, release just after in parent and child, or just reinit them in the child process)

    Finally, IMHO, forking while holding a lock and expecting it to be usable after fork doesn't make much sense, since a lock is acquired by a thread, and this threads doesn't exist in the child process. It's explicitely described as "undefined" by POSIX, see http://pubs.opengroup.org/onlinepubs/007908799/xsh/sem_init.html :
    """
    The use of the semaphore by threads other than those created in the same process is undefined.
    """

    So I'm not sure whether it's feasable/wise to provide such a guarantee.

    @pitrou
    Copy link
    Member

    pitrou commented May 3, 2011

    Also, this would imply keeping track of the thread currently owning
    the lock,

    Yes, we would need to keep track of the thread id and process id inside
    the lock. We also need a global variable of the main thread id after
    fork, and a per-lock "taken" flag.

    Synopsis:

        def _reinit_if_needed(self):
            # Call this before each acquire() or release()
            if self.pid != getpid():
                sem_init(self.sem, 0, 1)
                if self.taken:
                    if self.tid == main_thread_id_after_fork:
                        # Lock was taken in forked thread, re-take it
                        sem_wait(self.sem)
                    else:
                        # It's now released
                        self.taken = False
                self.pid = getpid()
                self.tid = current_thread_id()

    and doesn't match the typical pthread_atfork idiom (acquire locks just
    before fork, release just after in parent and child, or just reinit
    them in the child process)

    Well, I fail to understand how that idiom can help us. We're not a
    self-contained application, we're a whole programming language.
    Calling fork() only when no lock is held is unworkable (for example, we
    use locks around buffered I/O objects).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 4, 2011

    Yes, we would need to keep track of the thread id and process id inside
    the lock. We also need a global variable of the main thread id after
    fork, and a per-lock "taken" flag.

    Synopsis:

       def _reinit_if_needed(self):
           # Call this before each acquire() or release()
           if self.pid != getpid():
               sem_init(self.sem, 0, 1)
               if self.taken:
                   if self.tid == main_thread_id_after_fork:
                       # Lock was taken in forked thread, re-take it
                       sem_wait(self.sem)
                   else:
                       # It's now released
                       self.taken = False
               self.pid = getpid()
               self.tid = current_thread_id()

    A couple remarks:

    • with linuxthreads, different threads within the same process have
      the same PID - it may be true for other implementations - so this
      would lead to spurious reinitializations
    • what's current_thread_id ? If it's thread_get_ident (pthread_self),
      since TID is not guaranteed to be inherited across fork, this won't
      work
    • calling getpid at every acquire/release is expensive, even though
      it's a trivial syscall (it'll have to measured though)
    • imagine the following happens:

    P1

    lock.acquire()
    fork()    ->       P2
                           start_new_thread T2
                           T1           T2
                                            lock.acquire()

    The acquisition of lock by T2 will cause lock's reinitialization: what
    happens to the lock wait queue ? who owns the lock ?
    That why I don't think we can delay the reinitialization of locks, but
    I could be wrong.

    Well, I fail to understand how that idiom can help us. We're not a
    self-contained application, we're a whole programming language.
    Calling fork() only when no lock is held is unworkable (for example, we
    use locks around buffered I/O objects).

    Yes, but in that case, you don't have to reacquire the locks after fork.
    In the deadlock you experienced above, the thread that forked wasn't
    the one in the I/O code, so the corresponding lock can be
    re-initialized anyway, since the thread in the I/O code at that time
    won't exist after fork.
    And it's true with every lock in the library code: they're only held
    in short critical sections (typically acquired when entering a
    function and released when leaving), and since it's not the threads
    inside those libraries that fork, all those locks can simply be
    reinitialized on fork, without having the reacquire them.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 4, 2011

    Oops, for liunxthreads, you should of course read "different PIDs", not "same PID".

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2011

    • what's current_thread_id ? If it's thread_get_ident (pthread_self),
      since TID is not guaranteed to be inherited across fork, this won't
      work

    Ouch, then the approach I'm proposing is probably doomed.

    And it's true with every lock in the library code: they're only held
    in short critical sections (typically acquired when entering a
    function and released when leaving), and since it's not the threads
    inside those libraries that fork, all those locks can simply be
    reinitialized on fork, without having the reacquire them.

    Well, this means indeed that *some* locks can be handled, but not all of
    them and not automatically, right?
    Also, how would you propose they be dealt with in practice? How do they
    get registered, and how does the reinitialization happen?

    (do note that library code can call arbitrary third-party code, by the
    way: for example through encodings in the text I/O layer)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 4, 2011

    > - what's current_thread_id ? If it's thread_get_ident (pthread_self),
    > since TID is not guaranteed to be inherited across fork, this won't
    > work

    Ouch, then the approach I'm proposing is probably doomed.

    Well, it works on Linux with NPTL, but I'm not sure at all it holds
    for other implementations (pthread_t it's only meaningful within the
    same process).
    But I'm not sure it's really the "killer" point: PID with linuxthreads
    and lock being acquired by a second thread before the main thread
    releases it in the child process also look like serious problems.

    Well, this means indeed that *some* locks can be handled, but not all of
    them and not automatically, right?
    Also, how would you propose they be dealt with in practice? How do they
    get registered, and how does the reinitialization happen?

    When a lock object is allocated in Modules/threadmodule.c
    (PyThread_allocate_lock/newlockobject), add the underlying lock
    (self->lock_lock) to a linked list (since it's called with the GIL
    held, we don't need to protect the linked list from concurrent
    access). Each thread implementation (thread_pthread.h, thread_nt.h)
    would provide a new PyThread_reinit_lock function that would do the
    right thing (pthread_mutex_destroy/init, sem_destroy/init, etc).
    Modules/threadmodule.c would provide a new PyThread_ReInitLocks that
    would walk through the linked list and call PyThread_reinit_lock for
    each lock.
    PyOS_AfterFork would call this PyThread_ReInitLocks right after fork.
    This would have the advantage of being consistent with what's already
    done to reinit the TLS key and the import lock. So, we guarantee to be
    in a consistent and usable state when PyOS_AfterFork returns. Also,
    it's somewhat simpler because we're sure that at that point only one
    thread is running (once again, no need to protect the linked-list
    walk).
    I don't think that the performance impact would be noticable (I know
    it's O(N) where N is the number of locks), and contrarily to the
    automatic approach, this wouldn't penalize every acquire/release.
    Of course, this would solve the problem of threading's module locks,
    so PyEval_ReInitThreads could be removed, along with threading.py's
    _after_fork and _reset_internal_locks.
    In short, this would reset every lock held so that they're usable in
    the child process, even locks allocated e.g. from
    Modules/_io/bufferedio.c.
    But this wouldn't allow a lock's state to be inherited across fork for
    the main thread (but like I said, I don't think that this makes much
    sense anyway, and to my knowledge no implementation makes such a
    guarantee - and definitely not POSIX).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 5, 2011

    Please disregard my comment on PyEval_ReInitThreads and _after_fork:
    it will of course still be necessary, because it does much more than
    just reinitializing locks (e.g. stop threads).
    Also, note that both approaches don't handle synchronization
    primitives other than bare Lock and RLock. For example, Condition and
    Event used in the threading module wouldn't be reset automatically:
    that's maybe something that could be handled by Gregory's atfork
    mechanism.

    @pitrou
    Copy link
    Member

    pitrou commented May 8, 2011

    Thanks for the explanations. This sounds like an interesting path.

    Each thread implementation (thread_pthread.h, thread_nt.h)
    would provide a new PyThread_reinit_lock function that would do the
    right thing (pthread_mutex_destroy/init, sem_destroy/init, etc).
    Modules/threadmodule.c would provide a new PyThread_ReInitLocks that
    would walk through the linked list and call PyThread_reinit_lock for
    each lock.

    Actually, I think the issue is POSIX-specific: Windows has no fork(),
    and we don't care about other platforms anymore (they are, are being, or
    will be soon deprecated).
    It means only the POSIX implementation needs to register its locks in a
    linked list.

    But this wouldn't allow a lock's state to be inherited across fork for
    the main thread (but like I said, I don't think that this makes much
    sense anyway, and to my knowledge no implementation makes such a
    guarantee - and definitely not POSIX).

    Well, the big difference between Python locks and POSIX mutexes is that
    Python locks can be released from another thread. They're a kind of
    trivial semaphore really, and this makes them usable for other purpose
    than mutual exclusion (you can e.g. use a lock as a simple event by
    blocking on a second acquire() until another thread calls release()).

    But even though we might not be "fixing" arbitrary Python code
    automatically, fixing the interpreter's internal locks (especially the
    IO locks) would be great already.

    (we could also imagine that the creator of the lock decides whether it
    should get reinitialized after fork)

    @nirai
    Copy link
    Mannequin

    nirai mannequin commented May 12, 2011

    Hi,

    There seem to be two alternatives for atfork handlers:

    1. acquire locks during prepare phase and unlock them in parent and child after fork.
    2. reset library to some consistent state in child after fork.

    http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

    Option (2) makes sense but is probably not always applicable.
    Option (1) depends on being able to acquire locks in locking order, but how can we determine correct locking order across libraries?

    Initializing locks in child after fork without acquiring them before the fork may result in corrupted program state and so is probably not a good idea.

    On a positive note, if I understand correctly, Python signal handler functions are actually run in the regular interpreter loop (as pending calls) after the signal has been handled and so os.fork() atfork handlers will not be restricted to async-signal-safe operations (since a Python fork is never done in a signal handler).

    http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
    http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
    "It is therefore undefined for the fork handlers to execute functions that are not async-signal-safe when fork() is called from a signal handler."

    Opinion by Butenhof who was involved in the standardization effort of POSIX threads:
    http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde

    ...so how can we establish correct (cross library) locking order during prepare stage?

    Nir

    @sdaoden
    Copy link
    Mannequin

    sdaoden mannequin commented May 12, 2011

    @nir Aides: *thanks* for this link:
    http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde
    You made my day!

    @pitrou
    Copy link
    Member

    pitrou commented May 13, 2011

    ...so how can we establish correct (cross library) locking order
    during prepare stage?

    That sounds like a lost battle, if it requires the libraries'
    cooperation. I think resetting locks is the best we can do. It might not
    work ok in all cases, but if it can handle simple cases (such as I/O and
    logging locks), it is already very good.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 13, 2011

    Hi,

    Hello Nir,

    Option (2) makes sense but is probably not always applicable.
    Option (1) depends on being able to acquire locks in locking order, but how
    can we determine correct locking order across libraries?

    There are indeed a couple problems with 1:

    1. actually, releasing the mutex/semaphore from the child is not
      guaranteed to be safe, see this comment from glibc's malloc:
      /* In NPTL, unlocking a mutex in the child process after a
      fork() is currently unsafe, whereas re-initializing it is safe and
      does not leak resources. Therefore, a special atfork handler is
      installed for the child. */
      We could just destroy/reinit them, though.

    2. acquiring locks just before fork is probably one of the best way to
      deadlock (acquiring a lock we already hold, or acquiring a lock needed
      by another thread before it releases its own lock). Apart from adding
      dealock avoidance/recovery mechanisms - which would be far from
      trivial - I don't see how we could solve this, given that each library
      can use its own locks, not counting the user-created ones

    3. there's another special lock we must take into account, the GIL:
      contrarily to a typical C program, we can't have the thread forking
      blindly try to acquire all locks just before fork, because since we
      hold the GIL, other threads won't be able to proceed (unless of course
      they're in a section where they don't run without the GIL held).

    So, we would have to:

    • release the GIL
    • acquire all locks in the correct order
    • re-acquire the GIL
    • fork
    • reinit all locks after fork

    I think this is going to be very complicated.

    1. Python locks differ from usual mutexes/semaphores in that they can
      be held for quite some time (for example while performing I/O). Thus,
      acquiring all the locks could take a long time, and users might get
      irritated if fork takes 2 seconds to complete.

    2. Finally, there's a fundamental problem with this approach, because
      Python locks can be released by a thread other than the one that owns
      it.
      Imagine this happens:

    T1 T2
    lock.acquire()
    (do something without releasing lock)
    fork()
    lock.release()

    This is perfectly valid with the current lock implementation (for
    example, it can be used to implement a rendez-vous point so that T2
    doesn't start processing before T1 forked worker processes, or
    whatever).
    But if T1 tries to acquire lock (held by T2) before fork, then it will
    deadlock, since it will never be release by T2.

    For all those reasons, I don't think that this approach is reasonable,
    but I could be wrong :-)

    Initializing locks in child after fork without acquiring them before the
    fork may result in corrupted program state and so is probably not a good
    idea.

    Yes, but in practise, I think that this shouldn't be too much of a
    problem. Also note that you can very well have the same type of
    problem with sections not protected explicitely by locks: for example,
    if you have a thread working exclusively on an object (maybe part of a
    threadpool), a fork can very well happen while the object is in an
    inconsistent state. Acquiring locks before fork won't help that.
    But I think this should eventually be addressed, maybe by specific
    atfork handlers.

    On a positive note, if I understand correctly, Python signal handler
    functions are actually run in the regular interpreter loop (as pending
    calls) after the signal has been handled and so os.fork() atfork handlers
    will not be restricted to async-signal-safe operations (since a Python fork
    is never done in a signal handler).

    That's correct.

    In short, I think that we could first try to avoid common deadlocks by
    just resetting locks in the child process. This is not panacea, but
    this should solve the vast majority of deadlocks, and would open the
    door to potential future refinements using atfork-like handlers.

    Attached is a first draft for a such patch (with tests).
    Synopsis:

    • when a PyThread_type_lock is created, it's added to a linked-list,
      when it's deleted, it's removed from the linked list
    • PyOS_AfterFork() calls PyThread_ReinitLocks() which calls
      PyThread_reinit_lock() for each lock in the linked list
    • PyThread_reinit_lock() does the right thing (i.e. sem_destroy/init
      for USE_SEMAPHORES and pthread_(mutex|cond)_destroy/init for emulated
      semaphores).

    Notes:

    • since it's only applicable to POSIX (since other Unix thread
      implementations will be dropped), I've only defined a
      PyThread_ReinitLocks inside Python/thread_pthread.h, so it won't build
      on other platforms. How should I proceed: like PyThread_ReInitTLS(),
      add a stub function to all Python/thread_xxx.h, or guard the call to
      PyThread_ReinitLocks() with #ifdef _POSIX_THREADS ?
    • I'm not sure of how to handle sem_init/etc failures in the reinit
      code: for now I just ignore this possibility, like what's done for the
      import lock reset
    • insertions/removals from the linked list are not protected from
      concurrent access because I assume that locks are created/deleted with
      the GIL held: is that a reasonable assumption, or should I add a mutex
      to protect those accesses?

    This fixes common deadlocks with threading.Lock, and
    PyThread_type_lock (used for example by I/O code).

    @gpshead gpshead added the 3.7 (EOL) end of life label May 30, 2017
    @Birne94
    Copy link
    Mannequin

    Birne94 mannequin commented May 31, 2017

    While having to deal with this bug for a while I have written a small library using pthread_atfork: https://github.com/Birne94/python-atfork-lock-release

    It allows registering atfork-hooks (similar to the ones available by now) and frees the stdout/stderr as well as manually provided io locks. I guess it uses some hacky ways to get the job done, but resolved the issue for me and has been working without problems for some weeks now.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 21, 2017

    I think we should somehow move forward on this, at least for logging locks which can be quite an annoyance.

    There are two possible approaches:

    • either a generic mechanism as posted by sbt in reinit_locks_2.diff
    • or a logging-specific fix using os.register_at_fork()

    What do you think?

    @pitrou
    Copy link
    Member

    pitrou commented Oct 21, 2017

    Oh, I forgot that IO buffered objects also have a lock. So we would have to special-case those as well, unless we take the generic approach...

    A problem with the generic approach is that it would leave higher-level synchronization objects such as RLock, Event etc. in an inconsistent state. Not to mention the case where the lock is taken by the thread calling fork()...

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 21, 2017

    logging is pretty easy to deal with so I created a PR.

    bufferedio.c is a little more work as we either need to use the posixmodule.c os.register_at_fork API or expose it as an internal C API to be able to call it to add acquires and releases around the buffer's self->lock member when non-NULL. either way, that needs to be written safely so that it doesn't crash if fork happens after a buffered io struct is freed. (unregister at fork handlers when freeing it? messy)

    @pitrou
    Copy link
    Member

    pitrou commented Oct 21, 2017

    Actually, we already have a doubly-linked list of buffered IO objects
    (for another purpose), so we can reuse that and register a single set of
    global callbacks.

    @ochedru
    Copy link
    Mannequin

    ochedru mannequin commented Apr 5, 2018

    FWIW, I encountered the same kind of issue when using the mkstemp() function: under the hood, it calls gettempdir() and this one is protected by a lock too.

    Current thread 0x00007ff10231f700 (most recent call first):
    File "/usr/lib/python3.5/tempfile.py", line 432 in gettempdir
    File "/usr/lib/python3.5/tempfile.py", line 269 in _sanitize_params
    File "/usr/lib/python3.5/tempfile.py", line 474 in mkstemp

    @gpshead
    Copy link
    Member Author

    gpshead commented Sep 14, 2018

    New changeset 1900384 by Gregory P. Smith in branch 'master':
    bpo-6721: Hold logging locks across fork() (GH-4071)
    1900384

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 7, 2018

    New changeset 3b69993 by Gregory P. Smith (Miss Islington (bot)) in branch '3.7':
    bpo-6721: Hold logging locks across fork() (GH-4071) (bpo-9291)
    3b69993

    @vstinner
    Copy link
    Member

    vstinner commented Nov 8, 2018

    New changeset 3b69993 by Gregory P. Smith (Miss Islington (bot)) in branch '3.7':
    bpo-6721: Hold logging locks across fork() (GH-4071) (bpo-9291)

    It seems like this change caused a regression in the Anaconda installer of Fedora:
    https://bugzilla.redhat.com/show_bug.cgi?id=1644936

    But we are not sure at this point. I have to investigate to understand exactly what is happening.

    @cagney
    Copy link
    Mannequin

    cagney mannequin commented Apr 2, 2019

    I suspect 3b69993 is causing a hang in libreswan's kvmrunner.py on Fedora.

    Looking at the Fedora RPMs:

    python3-3.7.0-9.fc29.x86_64 didn't contain the fix and worked
    python3-3.7.1-4.fc29.x86_64 reverted the fix (for anaconda) and worked
    python3-3.7.2-4.fc29.x86_64 included the fix; eventually hangs

    I believe the hang looks like:

    Traceback (most recent call last):
      File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 389, in _process_test
        test_domains = _boot_test_domains(logger, test, domain_prefix, boot_executor)
      File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 203, in _boot_test_domains
        TestDomain.boot_and_login)
      File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 150, in submit_job_for_domain
        logger.debug("scheduled %s on %s", job, domain)
      File "/usr/lib64/python3.7/logging/__init__.py", line 1724, in debug
        
      File "/usr/lib64/python3.7/logging/__init__.py", line 1768, in log
        def __repr__(self):
      File "/usr/lib64/python3.7/logging/__init__.py", line 1449, in log
        """
      File "/usr/lib64/python3.7/logging/__init__.py", line 1519, in _log
        break
      File "/usr/lib64/python3.7/logging/__init__.py", line 1529, in handle
        logger hierarchy. If no handler was found, output a one-off error
      File "/usr/lib64/python3.7/logging/__init__.py", line 1591, in callHandlers
        
      File "/usr/lib64/python3.7/logging/__init__.py", line 905, in handle
        try:
      File "/home/build/libreswan-web/master/testing/utils/fab/logutil.py", line 163, in emit
        stream_handler.emit(record)
      File "/usr/lib64/python3.7/logging/__init__.py", line 1038, in emit
        Handler.__init__(self)
      File "/usr/lib64/python3.7/logging/__init__.py", line 1015, in flush
        name += ' '
      File "/usr/lib64/python3.7/logging/__init__.py", line 854, in acquire
        self.emit(record)
    KeyboardInterrupt

    @gpshead
    Copy link
    Member Author

    gpshead commented Apr 2, 2019

    We need a small test case that can reproduce your problem. I believe 3b69993 to be correct.

    acquiring locks before fork in the thread doing the forking and releasing them afterwards is always the safe thing to do.

    Example possibility: Does your code use any C code that forks on its own without properly calling the C Python PyOS_BeforeFork(), PyOS_AfterFork_Parent(), and PyOS_AfterFork_Child() APIs?

    @cagney
    Copy link
    Mannequin

    cagney mannequin commented Apr 3, 2019

    Does your code use any C code that forks on its own without properly calling the C Python PyOS_BeforeFork(), PyOS_AfterFork_Parent(), and PyOS_AfterFork_Child() APIs?

    No.

    Is there a web page explaining how to pull a python backtrace from all the threads running within a daemon?

    @gpshead
    Copy link
    Member Author

    gpshead commented Apr 3, 2019

    I'd start with faulthandler.register with all_threads=True and see if that gives you what you need.

    https://docs.python.org/3/library/faulthandler.html

    @cagney
    Copy link
    Mannequin

    cagney mannequin commented Apr 4, 2019

    acquiring locks before fork in the thread doing the forking and releasing them afterwards is always the safe thing to do.

    It's also an easy way to cause a deadlock:

    • register_at_fork() et.al. will cause per-logger locks to be acquired before the global lock (this isn't immediately obvious from the code)

    If a thread were to grab its logging lock before the global lock then it would deadlock. I'm guessing this isn't allowed - however I didn't see any comments to this effect?

    Can I suggest documenting this, and also merging the two callbacks so that the ordering of these two acquires is made explicit.

    • the per-logger locks are acquired in a random order

    If a thread were to acquire two per-logger locks in a different order then things would deadlock.

    @cagney
    Copy link
    Mannequin

    cagney mannequin commented Apr 4, 2019

    Below is a backtrace from the deadlock.

    It happens because the logging code is trying to acquire two per-logger locks; and in an order different to the random order used by the fork() handler.

    The code in question has a custom class DebugHandler(logging.Handler). The default logging.Handler.handle() method grabs its lock and calls .emit() vis:

            if rv:
                self.acquire()
                try:
                    self.emit(record)
                finally:
                    self.release()

    the custom .emit() then sends the record to a sub-logger stream vis:

        def emit(self, record):
            for stream_handler in self.stream_handlers:
                stream_handler.emit(record)
            if _DEBUG_STREAM:
                _DEBUG_STREAM.emit(record)

    and one of these emit() functions calls flush() which tries to acquire a further lock.

    Thread 0x00007f976b7fe700 (most recent call first):
    File "/usr/lib64/python3.7/logging/init.py", line 854 in acquire
    File "/usr/lib64/python3.7/logging/init.py", line 1015 in flush

        def flush(self):
            """
            Flushes the stream.
            """
            self.acquire() <

        try:
            if self.stream and hasattr(self.stream, "flush"):
                self.stream.flush()
        finally:
            self.release()
    

    File "/usr/lib64/python3.7/logging/init.py", line 1038 in emit

            self.flush() <\----
    

    File "/home/build/libreswan-web/master/testing/utils/fab/logutil.py", line 163 in emit

        def emit(self, record):
            for stream_handler in self.stream_handlers:
                stream_handler.emit(record) <---
            if _DEBUG_STREAM:
                _DEBUG_STREAM.emit(record)

    File "/usr/lib64/python3.7/logging/init.py", line 905 in handle

        def handle(self, record):
            """
            Conditionally emit the specified logging record.
        Emission depends on filters which may have been added to the handler.
        Wrap the actual emission of the record with acquisition/release of
        the I/O thread lock. Returns whether the filter passed the record for
        emission.
        """
        rv = self.filter(record)
        if rv:
            self.acquire()
            try:
                self.emit(record) <\---
            finally:
                self.release()
        return rv
    

    File "/usr/lib64/python3.7/logging/init.py", line 1591 in callHandlers

                        hdlr.handle(record)

    File "/usr/lib64/python3.7/logging/init.py", line 1529 in handle

                self.callHandlers(record)

    File "/usr/lib64/python3.7/logging/init.py", line 1519 in _log

            self.handle(record)

    File "/usr/lib64/python3.7/logging/init.py", line 1449 in log

            self._log(level, msg, args, **kwargs)

    File "/usr/lib64/python3.7/logging/init.py", line 1768 in log

                self.logger.log(level, msg, *args, **kwargs)

    File "/usr/lib64/python3.7/logging/init.py", line 1724 in debug

            self.log(DEBUG, msg, *args, **kwargs)

    File "/home/build/libreswan-web/master/testing/utils/fab/shell.py", line 110 in write

            self.logger.debug(self.message, ascii(text))

    @gpshead
    Copy link
    Member Author

    gpshead commented Apr 5, 2019

    Thanks for the debugging details! I've filed https://bugs.python.org/issue36533 to specifically track this potential regression in the 3.7 stable branch. lets carry on there where the discussion thread isn't too long for bug tracker sanity.

    @vstinner
    Copy link
    Member

    I created bpo-40089: Add _at_fork_reinit() method to locks.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 28, 2020

    Related issue:
    https://bugs.python.org/issue40399
    """
    IO streams locking can be broken after fork() with threads
    """

    @rojer
    Copy link
    Mannequin

    rojer mannequin commented Apr 29, 2020

    https://bugs.python.org/issue40442 is a fresh instance of this, entirely self-inflicted.

    @vstinner
    Copy link
    Member

    See also bpo-25920: PyOS_AfterFork should reset socketmodule's lock.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @vstinner
    Copy link
    Member

    vstinner commented Jun 6, 2022

    While it's true that "Locks in the standard library should be sanitized on fork", IMO having such "meta-issue" to track the issue in the 300+ stdlib modules is a bad idea, since it's hard to track how many modules got fixed and how many modules should still be fixed. Multiple modules have been fixed. I suggest to open more specific issues for remaining ones. I close the issue. Thanks for anyone who was involved in fixing issues! Good luck for people volunteers to fix remaining issues :-) Also, avoid fork without exec, it's no longer supported on macOS, it was never supported on Windows, and it causes tons of very complex bugs on Linux :-)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Development

    No branches or pull requests

    5 participants