Author neologix
Recipients bobbyi, gregory.p.smith, neologix, nirai, pitrou, sdaoden, vstinner
Date 2011-05-14.23:14:31
SpamBayes Score 0.0
Marked as misclassified No
Message-id <BANLkTinffJxUVS1KZeXDgrhEVKBcOyBMJA@mail.gmail.com>
In-reply-to <1305399297.63.0.0655124420366.issue6721@psf.upfronthosting.co.za>
Content
> a) We know the correct locking order in Python's std libraries so the problem there is kind of solved.

I think that you're greatly under-estimating the complexity of lock ordering.
If we were just implementing a malloc implementation protected with a
single mutex, then yes, it would be simple.
But here, you have multiple libraries with each their own locks, locks
at the I/O layer, in the socket module (some name resolution libraries
are not thread-safe), and in many other places. And all those
interact.
For example, buffered I/O objects each have their own lock (Antoine,
correct me if I'm wrong).
It's a common cause of deadlock.
Now imagine I have a thread that logs information to a bz2 stream, so
that it's compressed on-the-fly. Sounds reasonable, no?
Well, the lock hierarchy is:

buffered stream lock
bz2-level lock
logging object I/O lock

Do you still think that getting the locking order right is easy?

Another example, with I/O locks (and if you're concerned with data
corruption, those are definitely the one you would want to handle with
atfork):
I have a thread blocking on a write (maybe the output pipe is full,
maybe it's a NFS file system and the server takes a long time to
respond, etc. Or maybe it's just waiting for someone to type something
on stdin.).
Another thread forks.
The atfork-handler will try to acquire the buffered I/O object's lock:
it won't succeed until the other threads finally manages to
write/read. It could take seconds, or forever.
And there are many other things that could go wrong, because
contrarily to a standalone and self-contained library, Python is made
of several components, at different layers, that can call each other
in an arbitrary order. Also, some locks can be held for arbitrarily
long.

That's why I still think that this can be fully handled by atfork handlers.

But don't get me wrong: like you, I think that we should definitely
have an atfork mechanism. I just think it won't be able to solve all
the issues, and that I can also bring its own set of troubles.

Concerning the risk of corruption (invariant broken), you're right.
But resetting the locks is the approach currently in use for the
threading module, and it seems to work reasonably well there.

Finally, I'd just like to insist on a point:
In a multi-threaded program, between fork and exec, the code must be
async-safe. This means that in theory, you can't call
pthread_mutex_release/pthread_mutex_destroy, fwrite, malloc, etc.
Period.
This means that in theory, we shouldn't be running Python code at all!
So if we really wanted to be safe, the only solution would be to
forbid fork() in a multi-threaded program.
Since it's not really a reasonable option, and that the underlying
platform (POSIX) doesn't allow to be safe either, I guess that the
only choice left is to provide a bet-try implementation, knowing
perfectly that there will always be some corner cases that can't be
handled.
History
Date User Action Args
2011-05-14 23:14:33neologixsetrecipients: + neologix, gregory.p.smith, pitrou, vstinner, nirai, bobbyi, sdaoden
2011-05-14 23:14:32neologixlinkissue6721 messages
2011-05-14 23:14:32neologixcreate