This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neologix
Recipients Giovanni.Bajo, avian, bobbyi, gregory.p.smith, neologix, nirai, pitrou, sdaoden, vstinner
Date 2011-07-04.21:22:46
SpamBayes Score 0.0
Marked as misclassified No
Message-id <CAH_1eM2_DEF4WZr04GA=ng7TUNfhLhxuqWF4ZPVAfo7YtSej1g@mail.gmail.com>
In-reply-to <1309808497.69.0.0743071616941.issue6721@psf.upfronthosting.co.za>
Content
[...]
> A caveat is that since Python is an object oriented language it is more common than with C that code from a higher level module will be invoked by code from a lower level module, for example by calling an object method that was over-ridden by the higher level module - this actually happens in the logging module (emit method).

Exactly. That's why registering atfork() handler in independent
modules can - and will - lead to deadlocks, if we get the order wrong.
Also, most locks are allocated dynamically (at the same time as the
object they protect), so the import order is not really relevant here.
Furthermore, there's not a strict ordering between the modules: how is
bz2 compared to loglib, for example?

>
>> That's why I asked for a specific API: when do you register a handler?
>> When are they called? When are they reset?
>
> Read the pthread_atfork man page.
>

No, it won't do it, since when an object - and its protecting lock -
is deallocated, the related atfork handler must be removed, for
example. You might handle this with wearefs, but that's definitely
something not accounted for by the pthread_atfork standard API.

>> The whole point of atfork is to avoid breaking invariants and
>> introduce invalid state in the child process. If there is one thing we
>> want to avoid, it's precisely reading/writting corrupted data from/to
>> files, so eluding the I/O problem seems foolish to me.
>
> Please don't use insulting adjectives.
> If you think I am wrong, convincing me logically will do.
>

I'm sorry if that sounded insulting to you, it was really
unintentional (English is not my mother tongue).

> you can "avoid breaking invariants" using two different strategies:
> 1) Acquire locks before the fork and release/reset them after it.
> 2) Initialize the module to some known state after the fork.
>
> For some (most?) modules it may be quite reasonable to initialize the module to a known state after the fork without acquiring its locks before the fork; this too is explained in the pthread_atfork man page:
> "Alternatively, some libraries might be able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed)."
>

The most problematic place is the I/O layer, since those are the locks
held longer (see for example issue #7123). And I'm not sure we can
simply reinit the I/O object after fork() without corrupting or losing
data.
But this approach (reinitializing after fork()) works well most of the
time, and is actually already used in multiple places (threading and
multiprocessing modules, and probably others).

> Oops, I have always used the term "critical section" to describe a lock that protects data state as tightly as possible, ideally not even across function calls but now I see the Wikipedia defines one to protect any resource including IO.
>

Yes, that's one peculiarity of Python locks.
Another one is that a lock can be released by a process other than the
one who acquired it.

> The logging module locks the entire emit() function which I think is wrong.
> It should let the derived handler take care of locking when it needs to, if it needs to at all.
>
> The logging module is an example for a module we should reinitialize after the fork without locking its locks before the fork.

It's possible.

Like I said earlier in this thread, I'm not at all opposed to the
atfork mechanism. I'm actually in favor of it, the first reason being
that we could factorize the various ad-hoc atfork handlers scattered
through the standard library.
My point is just that it's not as simple as it sounds because of
long-held locks, and that we've got to be really careful because of
inter-module dependencies.

Would you like to work on a patch to add an atfork mechanism?
History
Date User Action Args
2011-07-04 21:22:47neologixsetrecipients: + neologix, gregory.p.smith, pitrou, vstinner, nirai, bobbyi, Giovanni.Bajo, sdaoden, avian
2011-07-04 21:22:46neologixlinkissue6721 messages
2011-07-04 21:22:46neologixcreate