msg94701 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-10-30 10:57 |
Python 2.6.4 (r264:75706, Oct 29 2009, 12:00:12) [C] on sunos5
On my sunos5 system if I call os.fork() in a thread created by
thread.start_new_thread(), it raises RuntimeError with the message 'not
holding the import lock'. It's a vanilla python compiled on sunos5 with
suncc. In 2.6.3 it's ok, I think this issue is caused by patch located
at http://codereview.appspot.com/96125/show.
The problem can be re-produced by running the script attached.
I've looked into the source and it seems to me the following:
Based on the the change above, it calls fork1() system call between a
lock-release calls:
3635 » _PyImport_AcquireLock();
3636 » pid = fork1();
3637 » result = _PyImport_ReleaseLock();
_PyImport_ReleaseLock is called in both the child and parent. Digging
more into the code, _PyImport_ReleaseLock starts with the following:
long me = PyThread_get_thread_ident();
if (me == -1 || import_lock == NULL)
return 0; /* Too bad */
if (import_lock_thread != me)
return -1;
In the above code, if I interpret correctly, it compares the result of
the current thread id returned by PyThread_get_thread_ident call with
the thread id of the thread holding the lock - if it's different then
the current thread cannot release the lock because it's not owned by it.
Based on my results on solaris the 'me' variable will be different in
the parent and in the child (in parent it remains the same) - resulting
that the child thinks that it's not holding the release lock.
I'm using pthreads on both linux and solaris. On linux the code above is
working fine, but on solaris it behaves differently.
|
msg94702 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-10-30 11:06 |
Well, there has been no such change between 2.6.3 and 2.6.4.
|
msg94704 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-10-30 11:41 |
Sorry, the working version is not 2.6.3 (I mistyped the version), it's
2.6.1 (I've no info about 2.6.2).
|
msg94765 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-10-31 19:50 |
This only appears to happen on Solaris. What version of Solaris are you
using? (i doubt that matters, i expect it happens on all versions)
I haven't look closely enough at the code yet, but reinitializing the
import lock in the child process should make sense here.
|
msg94820 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-11-02 08:31 |
I've tested it only on solaris 8, 32-bit.
|
msg94822 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-11-02 09:44 |
solaris 10 x86, 32-bit, sun-studio 11 is ok (in this case the parent's
thread has thread_id=2 and the child inherits this id)
solaris 8 sparc4, 32-bit, sun-studio 11 is not working
So it seems it's independent from sun-cc but depends from the
architecture and/or the OS.
|
msg94832 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-11-02 15:51 |
I've attached a patch which seems to fix this issue. It sets
import_lock_thread to the current thread id after forking in the child
process, but still I'm not quite sure that it's the correct way of
solving this issue.
|
msg95075 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2009-11-09 12:38 |
Additional info:
I've tested it on solaris 10 / sparc 32-bit, and my test script runs
fine on that.
Based on my test it seems that this bug does not affect solaris 10.
|
msg98148 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2010-01-22 15:09 |
Could you please provide any advise on this bug?
I've submitted my patch, would be curious if there any chance to get it merged.
|
msg98211 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2010-01-24 10:01 |
One relevant difference may be the change to the fork semantics in Solaris 10: fork() now means the same as fork1(). Before, fork() meant the same as forkall(), unless the applications was linked with -lpthread, in which case fork() also meant fork1(). See fork(2).
So I'd be curious if a) the test case passes on Solaris 8 if fork1 is being used, and b) whether it passes if you make sure -lpthread is being used. If so, I'd rather fix this issue by changing the linkage for Solaris 8 (or declaring that system as unsupported altogether), instead of fiddling yet again with the fork handling. In Python, posix.fork definitely needs to mean "POSIX fork".
If it's something else (i.e. the thread ID changes in Solaris 8 whether fork1 or forkall is used), then the patch is fine in principle. However, I would always reset the thread ID. In your patch, it is not clear why you apply this resetting to Solaris and AIX, but not other systems.
|
msg98287 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2010-01-25 13:32 |
I compile it with -lpthread.
os.fork1() was not available by default, I enabled it by removing two lines from posixmodule.c (it seems it's only enabled when #if defined(__USLC__) && defined(__SCO_VERSION__) is true).
With os.fork1() I have the same results, RuntimeError.
I was not able to compile it without pthread because I haven't found any configure options for that. If it's possible I'm happy to try it.
In my patch I wanted to reduce the effect on systems where forking in thread is working (eg. linux), that's the reason why I added "(defined (__SVR4) && defined (__sun)". But it just checks for solaris, not the OS version (on solaris 10/intel my demo is working).
(btw forking in thread actually happens in a unittest related to BaseHTTPServer, which obviously fails on my platform)
|
msg98299 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2010-01-25 21:15 |
> In my patch I wanted to reduce the effect on systems where forking in
> thread is working (eg. linux), that's the reason why I added
> "(defined (__SVR4) && defined (__sun)".
I think that's inappropriate, please change that.
|
msg98335 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2010-01-26 13:27 |
Ok, here's the new patch. I've removed the #ifdef-#endif lines. It passed the test thread_test.py on linux (and as well on solaris).
|
msg99202 - (view) |
Author: Dagobert Michelsen (dagobert) |
Date: 2010-02-11 08:35 |
I verified patch_2.diff on Solaris 10 w/SOS11 and it actually resolves a number of issues I had with Mercurial.
|
msg100090 - (view) |
Author: Greg Jednaszewski (jednaszewski) |
Date: 2010-02-25 16:20 |
I verified that patch_2.diff resolves the problem for me on Solaris 9/SPARC.
|
msg100100 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-02-25 19:45 |
The patch should really add an unit test for this.
|
msg100143 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2010-02-26 10:18 |
There's a bundled unittest in test_httpservers which actually fails if this patch is not applied.
See the unittest attached. Unfortunatelly RuntimError is raised in the child process after fork() so I cannot re-raise it to the parent, instead I send a message from the child to the parent via a pipe if the child is ok.
|
msg100144 - (view) |
Author: Zsolt Cserna (csernazs) * |
Date: 2010-02-26 10:20 |
I've run unittest with both patched and vanilla versions of python 2.6.4 and I confirm that it fails when it's run by vanilla on solaris 8, but passes on the patched version.
|
msg100171 - (view) |
Author: Greg Jednaszewski (jednaszewski) |
Date: 2010-02-26 20:27 |
I spent some time working on and testing a unit test as well. It's the same basic idea as Zsolt Cserna's, but with a slightly different approach. See 7242_unittest.diff. My unittest fails pre-patch and succeeds post-patch.
However, I still have reservations about the patch. The existing test test_threading.ThreadJoinOnShutdown.test_3_join_in_forked_from_thread hangs with the patch in place.
Vanilla 2.6.2 - test passes
Vanilla 2.6.4 - test fails
Patched 2.6.4 - test hangs
Note: the code of the test_threading test is identical in all 3 cases. I'd feel more confident about the patch if this test didn't hang with the patch in place.
|
msg100200 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2010-02-28 09:01 |
Using issue7242-gps01.diff on release26-maint and a freshly downloaded opensolaris 2009-06 VM test_thread, test_threading and test_subprocess all pass for me both before -and- after the patch. Nor does the original thread_test.py cause the problem for me (meaning I'm unable to reproduce the problem in this VM in the first place so... someone else needs to)
Regardless, I altered patch_2 a bit in that diff to do what I though _PyImport_ReInitLock() should really do. I also added the thread_unittest testcase to that diff.
% uname -a
SunOS opensolaris-vm 5.11 snv 111b i86pc i386 i86pc Solaris"
Can someone who could reproduce the problem in the first place please test that patch.
The logic change makes sense to me. I don't know why test_threading.ThreadJoinOnShutdown.test_3_join_in_forked_from_thread would be changing behavior for you.
|
msg100204 - (view) |
Author: Greg Jednaszewski (jednaszewski) |
Date: 2010-02-28 15:22 |
The problem only seems to appear on Solaris 9 and earlier. I'll try to test the updated patch tonight or tomorrow and let you know what I find.
|
msg100211 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2010-02-28 17:50 |
If you have a chance tonight that'd be awesome. I'd love to get this in before 2.6.5rc1 (being cut tomorrow) but as its platform specific (and a pretty-old platform at that) its not worth holding up the release.
|
msg100231 - (view) |
Author: Greg Jednaszewski (jednaszewski) |
Date: 2010-03-01 00:56 |
I tested the updated patch, and the new unit test passes on my Sol 8 sparc, but the test_threading test still hangs on my system. However, given that the test is skipped on several platforms and it does work on more relevant versions of Solaris, it's probably OK. It's possible that an OS bug is causing that particular hang.
Plus, the original patch fixed the 'real world' scenario I was running into, so I'd like to see it get into the release candidate if you're OK with it.
|
msg100233 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2010-03-01 01:23 |
Agreed.
Committed to trunk r78527. I'll wait for the buildbots before merging to release26-maint.
|
msg100245 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2010-03-01 04:41 |
r78543 backports this to release26-maint, it should be in 2.6.5 assuming no show stopper issue comes up with this during the release candidates.
|
msg124294 - (view) |
Author: Oren Held (Oren_Held) |
Date: 2010-12-18 16:55 |
Just adding some info:
This bug is not Solaris-specific; I reproduced it on HP-UX 11.31.
On Python 2.6.4, thread_test.py fails with the same RunTime error exception.
On Python 2.6.6, it passes and things look good.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:54 | admin | set | github: 51491 |
2010-12-18 16:55:12 | Oren_Held | set | nosy:
+ Oren_Held messages:
+ msg124294
|
2010-03-01 04:41:35 | gregory.p.smith | set | status: open -> closed resolution: fixed messages:
+ msg100245
|
2010-03-01 01:23:29 | gregory.p.smith | set | messages:
+ msg100233 |
2010-03-01 00:56:43 | jednaszewski | set | messages:
+ msg100231 |
2010-02-28 17:50:17 | gregory.p.smith | set | messages:
+ msg100211 |
2010-02-28 15:22:30 | jednaszewski | set | messages:
+ msg100204 |
2010-02-28 09:01:18 | gregory.p.smith | set | files:
+ issue7242-gps01.diff priority: normal -> high nosy:
loewis, twouters, gregory.p.smith, csernazs, pitrou, doughellmann, dagobert, jednaszewski messages:
+ msg100200
components:
+ Extension Modules, - Library (Lib) |
2010-02-26 20:27:33 | jednaszewski | set | files:
+ 7242_unittest.diff
messages:
+ msg100171 |
2010-02-26 10:20:20 | csernazs | set | messages:
+ msg100144 |
2010-02-26 10:18:17 | csernazs | set | files:
+ thread_unittest.py
messages:
+ msg100143 |
2010-02-25 19:45:03 | pitrou | set | messages:
+ msg100100 |
2010-02-25 16:20:16 | jednaszewski | set | nosy:
+ jednaszewski messages:
+ msg100090
|
2010-02-25 00:39:15 | doughellmann | set | nosy:
+ doughellmann
|
2010-02-11 08:35:04 | dagobert | set | nosy:
+ dagobert messages:
+ msg99202
|
2010-01-26 13:27:08 | csernazs | set | files:
+ patch_2.diff
messages:
+ msg98335 |
2010-01-25 21:15:29 | loewis | set | messages:
+ msg98299 |
2010-01-25 13:32:41 | csernazs | set | messages:
+ msg98287 |
2010-01-24 10:01:45 | loewis | set | messages:
+ msg98211 |
2010-01-22 16:11:37 | r.david.murray | set | nosy:
+ loewis
|
2010-01-22 15:09:30 | csernazs | set | messages:
+ msg98148 |
2009-11-09 12:38:08 | csernazs | set | messages:
+ msg95075 |
2009-11-02 15:51:54 | csernazs | set | files:
+ patch_1.diff keywords:
+ patch messages:
+ msg94832
|
2009-11-02 09:44:50 | csernazs | set | messages:
+ msg94822 |
2009-11-02 08:31:13 | csernazs | set | messages:
+ msg94820 |
2009-10-31 19:50:03 | gregory.p.smith | set | priority: normal versions:
+ Python 3.1, Python 2.7, Python 3.2 nosy:
+ twouters
messages:
+ msg94765
assignee: gregory.p.smith |
2009-10-30 11:41:27 | csernazs | set | messages:
+ msg94704 |
2009-10-30 11:06:41 | pitrou | set | nosy:
+ gregory.p.smith, pitrou messages:
+ msg94702
|
2009-10-30 10:58:01 | csernazs | create | |