Message 162113 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lesha
Recipients	Giovanni.Bajo, avian, bobbyi, gregory.p.smith, jcea, lesha, neologix, nirai, pitrou, sbt, sdaoden, vinay.sajip, vstinner
Date	2012-06-02.00:33:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1338597193.28.0.498292934726.issue6721@psf.upfronthosting.co.za>
In-reply-to

Content
I feel like I'm failing to get my thesis across. I'll write it out fully: == Thesis start == Basic fact: It is an error to use threading locks in _any_ way after a fork. I think we mostly agree on this. The programs we discussing are inherently buggy. We disagree on the right action when such a bug happens. I see 3 possibilities: 1) deadlock (the current behavior, if the lock was held in the parent at the time of fork) 2) continue to execute: a) as if nothing happened (the current behavior, if the lock was not held in the parent) b) throw an Exception (equivalent to a, see below) 3) crash hard. I think both 1 and 3 are tolerable, while 2 is completely unsafe because the resulting behavior of the program is unexpected and unpredictable (data corruption, deletion, random actions, etc). == Thesis end == I will now address Gregory's, Richard's, and Vinay's comments in view of this thesis: 1) Gregory suggests throwing an exception when the locks are used in a child. He also discusses some cases, in which he believes one could safely continue execution. My responses: a) Throwing an exception is tatamount to continuing execution. Imagine that the parent has a tempfile RAII object that erases the file after the object disappears, or in some exception handler. The destructor / handler will now get called in the child... and the parent's tempfile is gone. Good luck tracking that one down. b) In general, is not safe to continue execution on release(). If you release() and reinitialize, the lock could still later be reused by both parent and child, and there would still be contention leading to data corruption. c) Re: deadlocks are unacceptable... A deadlock is better than data corruption. Whether you prefer a deadlock or a crash depends on whether your system is set up to dump core. You can always debug a deadlock with gdb. A crash without a core dump is impossible to diagnose. However, a crash is harder to ignore, and it lets the process recover. So, in my book, though I'm not 100% certain: hard crash > deadlock > corruption d) However, we can certainly do better than today: i) Right now, we sometimes deadlock, and sometimes continue execution. It would be better to deadlock always (or crash always), no matter how the child uses the lock. ii) We can log before deadlocking (this is hard in general, because it's unclear where to log to), but it would immensely speed up debugging. iii) We can hard-crash with an extra-verbose stack dump (i.e. dump the lock details in addition to the stack) 2) Richard explains how my buggy snippets are buggy, and how to fix them. I respond: Richard, thanks for explaining how to avoid these bugs! Nonetheless, people make bugs all the time, especially in areas like this. I made these bugs. I now know better, mostly, but I wouldn't bet on it. We should choose the safest way to handle these bugs: deadlocking always, or crashing always. Reinitializing the locks is going to cost Python users a lot more in the long run. Deadlocking _sometimes_, as we do now, is equally bad. Also, even your code is potentially unsafe: when you execute the excepthook in the child, you could be running custom exception logic, or even a custom excepthook. Those could well-intentionedly, but stupidly, destroy some of the parent's valuable data. 3) Vinay essentially says "using logging after fork is user error". I respond: Yes, it is. In any other logging library, this error would only result in mangled log lines, but no lasting harm. In Python, you sometimes get a deadlock, and other times, mangled lines. > logging is not doing anything to protect things outside of a single process A file is very much outside a single process. If you are logging to a file, the only correct way is to use a file lock. Thus, I stand by my assertion that "logging" is buggy. Windows programs generally have no problems with this. fork() on UNIX gives you both the rope and the gallows to hang yourself. Specifically for logging, I think reasonable options include: a) [The Right Way (TM)] Using a file lock + CLOEXEC when available; this lets multiple processes cooperate safely. b) It's okay to deadlock & log with an explanation of why the deadlock is happening. c) It's okay to crash with a similar explanation. d) It's pretty okay even to reinitialize logs, although mangled log lines do prevent automated parsing. I really hope that my compact thesis can help us get closer to a consensus, instead of arguing about the details of specific bugs.

I feel like I'm failing to get my thesis across. I'll write it out fully:

== Thesis start ==

Basic fact: It is an error to use threading locks in _any_ way after a
fork. I think we mostly agree on this. The programs we discussing are
**inherently buggy**.

We disagree on the right action when such a bug happens. I see 3 possibilities:

1) deadlock (the current behavior, if the lock was held in the parent at the time of fork)

2) continue to execute:
a) as if nothing happened (the current behavior, if the lock was not
held in the parent)
b) throw an Exception (equivalent to a, see below)

3) crash hard.

I think both 1 and 3 are tolerable, while 2 is **completely unsafe**
because the resulting behavior of the program is unexpected and unpredictable (data corruption, deletion, random actions, etc).

== Thesis end ==

I will now address Gregory's, Richard's, and Vinay's comments in view
of this thesis:

1) Gregory suggests throwing an exception when the locks are used in a
child. He also discusses some cases, in which he believes one could
safely continue execution.

My responses:

a) Throwing an exception is tatamount to continuing execution.

Imagine that the parent has a tempfile RAII object that erases the
file after the object disappears, or in some exception handler.

The destructor / handler will now get called in the child... and the
parent's tempfile is gone. Good luck tracking that one down.

b) In general, is not safe to continue execution on release(). If you
release() and reinitialize, the lock could still later be reused by
both parent and child, and there would still be contention leading to
data corruption.

c) Re: deadlocks are unacceptable...

A deadlock is better than data corruption. Whether you prefer a
deadlock or a crash depends on whether your system is set up to dump
core. You can always debug a deadlock with gdb. A crash without a core
dump is impossible to diagnose. However, a crash is harder to ignore,
and it lets the process recover. So, in my book, though I'm not 100%
certain: hard crash > deadlock > corruption

d) However, we can certainly do better than today:

i) Right now, we sometimes deadlock, and sometimes continue execution.
It would be better to deadlock always (or crash always), no matter how
the child uses the lock.

ii) We can log before deadlocking (this is hard in general, because
it's unclear where to log to), but it would immensely speed up
debugging.

iii) We can hard-crash with an extra-verbose stack dump (i.e. dump the lock details in addition to the stack)

2) Richard explains how my buggy snippets are buggy, and how to fix them.

I respond: Richard, thanks for explaining how to avoid these bugs!

Nonetheless, people make bugs all the time, especially in areas like
this. I made these bugs. I now know better, mostly, but I wouldn't bet on it.

We should choose the safest way to handle these bugs: deadlocking
always, or crashing always. Reinitializing the locks is going to cost
Python users a lot more in the long run. Deadlocking _sometimes_, as we do now, is equally bad.

Also, even your code is potentially unsafe: when you execute the
excepthook in the child, you could be running custom exception logic,
or even a custom excepthook. Those could well-intentionedly, but
stupidly, destroy some of the parent's valuable data.

3) Vinay essentially says "using logging after fork is user error".

I respond: Yes, it is. In any other logging library, this error would only result in mangled log lines, but no lasting harm.

In Python, you sometimes get a deadlock, and other times, mangled lines.

> logging is not doing anything to protect things *outside* of a single process

A file is very much outside a single process. If you are logging to a file, the only correct way is to use a file lock. Thus, I stand by my assertion that "logging" is buggy.

Windows programs generally have no problems with this. fork() on UNIX gives you both the rope and the gallows to hang yourself.

Specifically for logging, I think reasonable options include:

a) [The Right Way (TM)] Using a file lock + CLOEXEC when available; this lets multiple processes cooperate safely.

b) It's okay to deadlock & log with an explanation of why the deadlock is happening.

c) It's okay to crash with a similar explanation.

d) It's pretty okay even to reinitialize logs, although mangled log lines do prevent automated parsing.

I really hope that my compact thesis can help us get closer to a consensus, instead of arguing about the details of specific bugs.

History
Date	User	Action	Args
2012-06-02 00:33:13	lesha	set	recipients: + lesha, gregory.p.smith, vinay.sajip, jcea, pitrou, vstinner, nirai, bobbyi, neologix, Giovanni.Bajo, sdaoden, sbt, avian
2012-06-02 00:33:13	lesha	set	messageid: <1338597193.28.0.498292934726.issue6721@psf.upfronthosting.co.za>
2012-06-02 00:33:12	lesha	link	issue6721 messages
2012-06-02 00:33:08	lesha	create