Message 135899 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	neologix
Recipients	bobbyi, gregory.p.smith, neologix, nirai, pitrou, sdaoden, vstinner
Date	2011-05-13.11:24:32
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<BANLkTinXqo5D6vm51AVoPEwneRVPoxmfKw@mail.gmail.com>
In-reply-to	<1305230511.18.0.0895360934602.issue6721@psf.upfronthosting.co.za>

Content
> Hi, > Hello Nir, > Option (2) makes sense but is probably not always applicable. > Option (1) depends on being able to acquire locks in locking order, but how > can we determine correct locking order across libraries? > There are indeed a couple problems with 1: 1) actually, releasing the mutex/semaphore from the child is not guaranteed to be safe, see this comment from glibc's malloc: /* In NPTL, unlocking a mutex in the child process after a fork() is currently unsafe, whereas re-initializing it is safe and does not leak resources. Therefore, a special atfork handler is installed for the child. */ We could just destroy/reinit them, though. 2) acquiring locks just before fork is probably one of the best way to deadlock (acquiring a lock we already hold, or acquiring a lock needed by another thread before it releases its own lock). Apart from adding dealock avoidance/recovery mechanisms - which would be far from trivial - I don't see how we could solve this, given that each library can use its own locks, not counting the user-created ones 3) there's another special lock we must take into account, the GIL: contrarily to a typical C program, we can't have the thread forking blindly try to acquire all locks just before fork, because since we hold the GIL, other threads won't be able to proceed (unless of course they're in a section where they don't run without the GIL held). So, we would have to: - release the GIL - acquire all locks in the correct order - re-acquire the GIL - fork - reinit all locks after fork I think this is going to be very complicated. 4) Python locks differ from usual mutexes/semaphores in that they can be held for quite some time (for example while performing I/O). Thus, acquiring all the locks could take a long time, and users might get irritated if fork takes 2 seconds to complete. 5) Finally, there's a fundamental problem with this approach, because Python locks can be released by a thread other than the one that owns it. Imagine this happens: T1 T2 lock.acquire() (do something without releasing lock) fork() lock.release() This is perfectly valid with the current lock implementation (for example, it can be used to implement a rendez-vous point so that T2 doesn't start processing before T1 forked worker processes, or whatever). But if T1 tries to acquire lock (held by T2) before fork, then it will deadlock, since it will never be release by T2. For all those reasons, I don't think that this approach is reasonable, but I could be wrong :-) > Initializing locks in child after fork without acquiring them before the > fork may result in corrupted program state and so is probably not a good > idea. Yes, but in practise, I think that this shouldn't be too much of a problem. Also note that you can very well have the same type of problem with sections not protected explicitely by locks: for example, if you have a thread working exclusively on an object (maybe part of a threadpool), a fork can very well happen while the object is in an inconsistent state. Acquiring locks before fork won't help that. But I think this should eventually be addressed, maybe by specific atfork handlers. > On a positive note, if I understand correctly, Python signal handler > functions are actually run in the regular interpreter loop (as pending > calls) after the signal has been handled and so os.fork() atfork handlers > will not be restricted to async-signal-safe operations (since a Python fork > is never done in a signal handler). That's correct. In short, I think that we could first try to avoid common deadlocks by just resetting locks in the child process. This is not panacea, but this should solve the vast majority of deadlocks, and would open the door to potential future refinements using atfork-like handlers. Attached is a first draft for a such patch (with tests). Synopsis: - when a PyThread_type_lock is created, it's added to a linked-list, when it's deleted, it's removed from the linked list - PyOS_AfterFork() calls PyThread_ReinitLocks() which calls PyThread_reinit_lock() for each lock in the linked list - PyThread_reinit_lock() does the right thing (i.e. sem_destroy/init for USE_SEMAPHORES and pthread_(mutex\|cond)_destroy/init for emulated semaphores). Notes: - since it's only applicable to POSIX (since other Unix thread implementations will be dropped), I've only defined a PyThread_ReinitLocks inside Python/thread_pthread.h, so it won't build on other platforms. How should I proceed: like PyThread_ReInitTLS(), add a stub function to all Python/thread_xxx.h, or guard the call to PyThread_ReinitLocks() with #ifdef _POSIX_THREADS ? - I'm not sure of how to handle sem_init/etc failures in the reinit code: for now I just ignore this possibility, like what's done for the import lock reset - insertions/removals from the linked list are not protected from concurrent access because I assume that locks are created/deleted with the GIL held: is that a reasonable assumption, or should I add a mutex to protect those accesses? This fixes common deadlocks with threading.Lock, and PyThread_type_lock (used for example by I/O code).

> Hi,
>

Hello Nir,

> Option (2) makes sense but is probably not always applicable.
> Option (1) depends on being able to acquire locks in locking order, but how
> can we determine correct locking order across libraries?
>

There are indeed a couple problems with 1:
1) actually, releasing the mutex/semaphore from the child is not
guaranteed to be safe, see this comment from glibc's malloc:
/* In NPTL, unlocking a mutex in the child process after a
   fork() is currently unsafe, whereas re-initializing it is safe and
   does not leak resources.  Therefore, a special atfork handler is
   installed for the child. */
We could just destroy/reinit them, though.

2) acquiring locks just before fork is probably one of the best way to
deadlock (acquiring a lock we already hold, or acquiring a lock needed
by another thread before it releases its own lock). Apart from adding
dealock avoidance/recovery mechanisms - which would be far from
trivial - I don't see how we could solve this, given that each library
can use its own locks, not counting the user-created ones

3) there's another special lock we must take into account, the GIL:
contrarily to a typical C program, we can't have the thread forking
blindly try to acquire all locks just before fork, because since we
hold the GIL, other threads won't be able to proceed (unless of course
they're in a section where they don't run without the GIL held).

So, we would have to:
- release the GIL
- acquire all locks in the correct order
- re-acquire the GIL
- fork
- reinit all locks after fork

I think this is going to be very complicated.

4) Python locks differ from usual mutexes/semaphores in that they can
be held for quite some time (for example while performing I/O). Thus,
acquiring all the locks could take a long time, and users might get
irritated if fork takes 2 seconds to complete.

5) Finally, there's a fundamental problem with this approach, because
Python locks can be released by a thread other than the one that owns
it.
Imagine this happens:

T1                         T2
                          lock.acquire()
                          (do something without releasing lock)
fork()
lock.release()

This is perfectly valid with the current lock implementation (for
example, it can be used to implement a rendez-vous point so that T2
doesn't start processing before T1 forked worker processes, or
whatever).
But if T1 tries to acquire lock (held by T2) before fork, then it will
deadlock, since it will never be release by T2.

For all those reasons, I don't think that this approach is reasonable,
but I could be wrong :-)

> Initializing locks in child after fork without acquiring them before the
> fork may result in corrupted program state and so is probably not a good
> idea.

Yes, but in practise, I think that this shouldn't be too much of a
problem. Also note that you can very well have the same type of
problem with sections not protected explicitely by locks: for example,
if you have a thread working exclusively on an object (maybe part of a
threadpool), a fork can very well happen while the object is in an
inconsistent state. Acquiring locks before fork won't help that.
But I think this should eventually be addressed, maybe by specific
atfork handlers.

> On a positive note, if I understand correctly, Python signal handler
> functions are actually run in the regular interpreter loop (as pending
> calls) after the signal has been handled and so os.fork() atfork handlers
> will not be restricted to async-signal-safe operations (since a Python fork
> is never done in a signal handler).

That's correct.

In short, I think that we could first try to avoid common deadlocks by
just resetting locks in the child process. This is not panacea, but
this should solve the vast majority of deadlocks, and would open the
door to potential future refinements using atfork-like handlers.

Attached is a first draft for a such patch (with tests).
Synopsis:
- when a PyThread_type_lock is created, it's added to a linked-list,
when it's deleted, it's removed from the linked list
- PyOS_AfterFork() calls PyThread_ReinitLocks() which calls
PyThread_reinit_lock() for each lock in the linked list
- PyThread_reinit_lock() does the right thing (i.e. sem_destroy/init
for USE_SEMAPHORES and pthread_(mutex|cond)_destroy/init for emulated
semaphores).

Notes:
- since it's only applicable to POSIX (since other Unix thread
implementations will be dropped), I've only defined a
PyThread_ReinitLocks inside Python/thread_pthread.h, so it won't build
on other platforms. How should I proceed: like PyThread_ReInitTLS(),
add a stub function to all Python/thread_xxx.h, or guard the call to
PyThread_ReinitLocks() with #ifdef _POSIX_THREADS ?
- I'm not sure of how to handle sem_init/etc failures in the reinit
code: for now I just ignore this possibility, like what's done for the
import lock reset
- insertions/removals from the linked list are not protected from
concurrent access because I assume that locks are created/deleted with
the GIL held: is that a reasonable assumption, or should I add a mutex
to protect those accesses?

This fixes common deadlocks with threading.Lock, and
PyThread_type_lock (used for example by I/O code).

Files
File name	Uploaded
reinit_locks.diff	neologix, 2011-05-13.11:24:31

History
Date	User	Action	Args
2011-05-13 11:24:36	neologix	set	recipients: + neologix, gregory.p.smith, pitrou, vstinner, nirai, bobbyi, sdaoden
2011-05-13 11:24:34	neologix	link	issue6721 messages
2011-05-13 11:24:32	neologix	create