msg18469 - (view) |
Author: Jan Olderdissen (janixia) |
Date: 2003-10-01 06:22 |
Note: This may be a dupe or a generalization of 595601.
Running below code snippet on 2.3.1 release and debug
build on Windows 2000/XP a few times will inevitably lead
to a crash. 2.2.2 also exhibits this behavior.
The crashing code:
------------
import thread
f=open("tmp1", "w")
def worker():
global f
while 1:
f.close()
f=open("tmp1", "w")
f.seek (0, 0)
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
while 1: pass
-------------
The issue appears to be this (and similar) code sections
from fileobject.c:
Py_BEGIN_ALLOW_THREADS
errno = 0;
ret = _portable_fseek(f->f_fp, offset, whence);
Py_END_ALLOW_THREADS
Note that due to
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS,
f->f_fp can be set to NULL by file_close prior to entering
_portable_fseek. Similar crashes can be observed with
read, write and close itself when they are mixed with a
concurrent close.
Obviously, the offending python code is buggy and a lock
should be used to prevent concurrent access to f.
However, it seems unreasonable for Python to crash
because of it.
A mutex preventing concurrent access to the file object
seems like a reasonable way to fix this.
|
msg18470 - (view) |
Author: Anthony Baxter (anthonybaxter) |
Date: 2003-10-01 07:03 |
Logged In: YES
user_id=29957
Attaching the failing code as an attachment, because SF
mangles source code.
|
msg18471 - (view) |
Author: Jan Olderdissen (janixia) |
Date: 2003-10-01 07:08 |
Logged In: YES
user_id=877914
My apologies for the mangled source. I suppose there isn't a
way for me to remedy the situation.
|
msg18472 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2003-10-04 08:01 |
Logged In: YES
user_id=21627
janixia, don't worry about the formatting - this is
partially SF's fault, too.
Would you be interested in developing and testing a patch? I
think it would be sufficient to move the f->f_fp access out
of the GIL releasage.
|
msg18473 - (view) |
Author: Jeremy Hylton (jhylton) |
Date: 2003-10-06 03:48 |
Logged In: YES
user_id=31392
Patch 595601 is an attempt to address this problem, but it's
incomplete. The file object API allows an extension to
extract to FILE * and squirrel it away. That's clearly
unsafe, because it can't participate in a locking scheme
without re-writing extensions.
Shane Hathaway proposed another solution here:
http://mail.python.org/pipermail/python-dev/2003-June/036543.html
The problem in this case is that we cause the call to
close() to raise an exception. I'd prefer to see the
exception raised elsewhere, because close() seldom fails and
is often closed from routines that are cleaning up at the
end. On the other hand, this solution would be easier to
implementation, so I'm at least +0 on it.
Let's do one or the other.
|
msg18474 - (view) |
Author: Jan Olderdissen (janixia) |
Date: 2003-10-06 23:02 |
Logged In: YES
user_id=877914
I'm inclined to go with Shane's suggested solution of
reference counting when file access is in progress. It requires
no synchronisation (increment, decrement and check are
outside global lock release) and should have the smallest
performance impact.
I don't think the FILE * extraction problem can be solved at
all. Once the horse it out of the barn... However, for
the "standard" case Shane's suggestion provides a neat and
clean solution for the problem.
If the community can agree on this solution, I could be talked
into implementing it.
|
msg55960 - (view) |
Author: Sean Reifschneider (jafo) * |
Date: 2007-09-17 10:15 |
If I read the 2003 python-dev thready correctly, there isn't a solution
to this. Does this need to go back to python-dev, or do we just call it
"wont fix"? Or...?
|
msg57338 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2007-11-09 22:14 |
I'm still able to reproduce the bug in Python 2.5 (svn) and 2.6 (trunk).
import thread
f=open("tmp1", "w")
def worker():
global f
while 1:
f.close()
f=open("tmp1", "w")
f.seek(0,0)
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
Unhandled exception in thread started by <function worker at 0xb7d01aac>
Traceback (most recent call last):
*** glibc detected *** ./python: malloc(): memory corruption: 0xb7efc008 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7dbe636]
/lib/tls/i686/cmov/libc.so.6(__libc_malloc+0x90)[0xb7dbffc0]
/lib/tls/i686/cmov/libc.so.6[0xb7dad03f]
/lib/tls/i686/cmov/libc.so.6(fopen64+0x2c)[0xb7daf61c]
./python(PyTraceBack_Print+0x1a4)[0x80ef0f4]
./python(PyErr_Display+0x76)[0x80e73a6]
./python[0x80ed80d]
./python(PyObject_Call+0x27)[0x805c927]
./python(PyEval_CallObjectWithKeywords+0x6c)[0x80c151c]
./python(PyErr_PrintEx+0xbe)[0x80e7e9e]
./python[0x80f37b1]
/lib/tls/i686/cmov/libpthread.so.0[0xb7ed146b]
/lib/tls/i686/cmov/libc.so.6(clone+0x5e)[0xb7e276de]
======= Memory map: ========
08048000-0813d000 r-xp 00000000 fe:01 10586072
/home/heimes/dev/python/release25-maint/python
0813d000-08162000 rw-p 000f4000 fe:01 10586072
/home/heimes/dev/python/release25-maint/python
08162000-081fe000 rw-p 08162000 00:00 0 [heap]
b6a00000-b6a21000 rw-p b6a00000 00:00 0
b6a21000-b6b00000 ---p b6a21000 00:00 0
b6bc1000-b6bc2000 ---p b6bc1000 00:00 0
b6bc2000-b73c2000 rw-p b6bc2000 00:00 0
b73c2000-b73c3000 ---p b73c2000 00:00 0
b73c3000-b7bc3000 rw-p b73c3000 00:00 0
b7bc3000-b7bff000 r-xp 00000000 08:05 325941 /lib/libncurses.so.5.6
b7bff000-b7c07000 rw-p 0003b000 08:05 325941 /lib/libncurses.so.5.6
b7c07000-b7c4e000 r-xp 00000000 08:05 325837 /lib/libncursesw.so.5.6
b7c4e000-b7c56000 rw-p 00046000 08:05 325837 /lib/libncursesw.so.5.6
b7c56000-b7c82000 r-xp 00000000 08:05 325955 /lib/libreadline.so.5.2
b7c82000-b7c86000 rw-p 0002c000 08:05 325955 /lib/libreadline.so.5.2
b7c86000-b7c87000 rw-p b7c86000 00:00 0
b7c87000-b7c8a000 r-xp 00000000 fe:01 10716611
/home/heimes/dev/python/release25-maint/build/lib.linux-i686-2.5/readline.so
b7c8a000-b7c8b000 rw-p 00003000 fe:01 10716611
/home/heimes/dev/python/release25-maint/build/lib.linux-i686-2.5/readline.so
b7c8b000-b7c92000 r--s 00000000 08:05 557857
/usr/lib/gconv/gconv-modules.cache
b7c92000-b7cd1000 r--p 00000000 08:05 570306
/usr/lib/locale/de_DE.utf8/LC_CTYPE
b7cd1000-b7d54000 rw-p b7cd1000 00:00 0
b7d54000-b7e98000 r-xp 00000000 08:05 326311
/lib/tls/i686/cmov/libc-2.6.1.so
b7e98000-b7e99000 r--p 00143000 08:05 326311
/lib/tls/i686/cmov/libc-2.6.1.so
b7e99000-b7e9b000 rw-p 00144000 08:05 326311
/lib/tls/i686/cmov/libc-2.6.1.so
b7e9b000-b7e9e000 rw-p b7e9b000 00:00 0
b7e9e000-b7ec1000 r-xp 00000000 08:05 326315
/lib/tls/i686/cmov/libm-2.6.1.so
b7ec1000-b7ec3000 rw-p 00023000 08:05 326315
/lib/tls/i686/cmov/libm-2.6.1.so
b7ec3000-b7ec5000 r-xp 00000000 08:05 326330
/lib/tls/i686/cmov/libutil-2.6.1.so
b7ec5000-b7ec7000 rw-p 00001000 08:05 326330
/lib/tls/i686/cmov/libutil-2.6.1.so
b7ec7000-b7ec8000 rw-p b7ec7000 00:00 0
b7ec8000-b7eca000 r-xp 00000000 08:05 326314
/lib/tls/i686/cmov/libdl-2.6.1.so
b7eca000-b7ecc000 rw-p 00001000 08:05 326314
/lib/tls/i686/cmov/libdl-2.6.1.so
b7ecc000-b7ee0000 r-xp 00000000 08:05 326325
/lib/tls/i686/cmov/libpthread-2.6.1.so
b7ee0000-b7ee2000 rw-p 00013000 08:05 326325
/lib/tls/i686/cmov/libpthread-2.6.1.so
b7ee2000-b7ee4000 rw-p b7ee2000 00:00 0
b7ef1000-b7efb000 r-xp 00000000 08:05 325908 /lib/libgcc_s.so.1
b7efb000-b7efc000 rw-p 0000a000 08:05 325908 /lib/libgcc_s.so.1
b7efc000-b7f01000 rw-p b7efc000 00:00 0
b7f01000-b7f1b000 r-xp 00000000 08:05 326530 /lib/ld-2.6.1.so
b7f1b000-b7f1d000 rw-p 00019000 08:05 326530 /lib/ld-2.6.1.so
bfcd2000-bfcee000 rw-p bfcd2000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
However Python 3.0 doesn't crash:
Unhandled exception in thread started by <function worker at 0x840860c>
Traceback (most recent call last):
File "<stdin>", line 6, in worker
File "/home/heimes/dev/python/py3k/Lib/io.py", line 1234, in seek
self.buffer.seek(pos)
File "/home/heimes/dev/python/py3k/Lib/io.py", line 877, in seek
return self.raw.seek(pos, whence)
IOError: [Errno 9] Bad file descriptor
|
msg64579 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-27 11:00 |
A small addition to Christian's code snippet allows me to reproduce the
problem as well:
import thread
f=open("tmp1", "w")
def worker():
global f
while 1:
f.close()
f = open("tmp1", "w")
f.seek(0,0)
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
while 1:
pass
|
msg64642 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-28 18:09 |
This is a preliminary patch which shows how things might be done better.
It only addresses close(), seek() and dealloc right now. However, as
mentioned in test_close_open_seek, if I raise the number of workers, I
get crashes (while test_close_open is fine). Perhaps fseek() in the
glibc is thread unsafe when operating on the same file descriptor?
|
msg64643 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-28 18:20 |
Another approach would be to add a dedicated lock for each PyFileObject.
This sounds a bit bad performance-wise but after all the GIL itself is a
lock, and we release and re-acquire it on each file operation, so why not?
|
msg64645 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-28 18:47 |
On the other hand, surprisingly enough, the flockfile/funlockfile
manpage tells me that:
The stdio functions are thread-safe. This is achieved by
assigning to each FILE
object a lockcount and (if the lockcount is nonzero) an
owning thread. For each
library call, these functions wait until the FILE object is no
longer locked by a
different thread, then lock it, do the requested I/O, and unlock
the object again.
This leaves me wondering what is happening in the above-mentioned test.
|
msg64654 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-28 22:45 |
Why hadn't I read #595601 in detail, it has an explanation:
[quoting Jeremy Hylton]
The universal newline code is squirrels the FILE * in a
local variable, which is worse. If it happens that
another thread closes the file, at best the local
points to a closed FILE *. But that memory could get
recycled and then there's no way to know what it points to.
[/quoting]
Even with careful coding, there's a small window between releasing the
GIL on our side, and acquiring the FILE-specific lock in the glibc,
during which the fclose() function can be invoked and release the FILE
just before we invoke another function (e.g. fseek()) on it.
|
msg64661 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-29 00:30 |
Some good news: I've found a way to continue with my approach and make
it working. The close() method now raises an appropriate IOError when
another method is being called from another thread. Attaching a patch,
which protects a bunch of file object methods (together with tests).
Now the bad news: the protection logic in fileobject.c is, hmm, a bit
contrived (and I'm not even sure it's 100% correct). If someone wants to
read it and put his veto before I go further...
|
msg64667 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2008-03-29 01:27 |
Closed #595601 as a duplicate.
|
msg64719 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-29 19:39 |
Actually, my approach was not 100% correct, it failed in some rare
cases. I've come to the conclusion that using an unlock count on the
PyFileObject is the correct way to proceed. I'm now attaching a working
and complete patch which protects all methods of the PyFileObject. The
original test suite runs fine, as well as the added test cases and Tim
Peters' crasher here:
http://mail.python.org/pipermail/python-dev/2003-June/036537.html
To sum up the changes brought by this patch:
- no supplementary locking
- but each time we release the GIL to do an operation on a FILE, we
increase a dedicated counter on the PyFileObject
- when close()ing a PyFileObject, if the aforementioned counter is
non-zero, we throw an IOError rather than risking calling fclose(). By
construction this cannot happen in the PyFileObject destructor, but if
ever it happens (for example if a C extension decides to put its hands
in the PyFileObject struct), we throw a SystemError instead.
|
msg64724 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-03-29 20:39 |
Ah, I had forgotten to protect the print statement as well. Here is a
new patch :-)
|
msg65017 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-04-06 03:07 |
I'm reviewing this patch now and plan to commit it after some testing.
A couple comments:
I'd rename your sts variables to status.
Also FYI:
Your use of volatile on the int unlocked_count member of PyFileObject
does not do what you think it does and isn't needed here anyways.
Access to the variable is always protected by the GIL unlocking and
locking of which should cause an implicit memory barrier guaranteeing
that all other CPUs in the system will see the same value stored in the
structure in memory.
The C volatile keyword on the other hand does not guarantee this.
volatile is useful for memory mapped IO but it makes no guarantees about
cache coherent access between multiple CPUs. (the atomic types in the
recent C++ standards are meant for that)
Both of the above are trivial changes, no need for another patch.
|
msg65022 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-04-06 06:25 |
I've attached my patch that I want to commit. The main change from
filethread4 is some cleanup in file_test to make it run a lot faster and
add verbose mode output to indicate how well it is actually testing the
problem (counting the times that close raises IOError).
One concern holding up my commit:
Will this test pass on windows? It is opening and closing the same file
in 'w+' mode from multiple threads of the same process at once.
Can someone with a windows dev environment please apply this patch and
test it. If it dislikes the above file behavior, can you propose a fix
for it (set windows file non-exclusive flags or whatever you're supposed
to do... the worse alternative would be to use a new filename on each
open but that could cause a nightmare of thousands of new files being
created by the test which then have to be cleaned up)?
thanks,
-gps
|
msg65029 - (view) |
Author: Trent Nelson (trent) * |
Date: 2008-04-06 10:40 |
Patched and tested on one of my buildbots, test_file passes without
error with your latest Patch Greg.
|
msg65031 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-04-06 11:09 |
Ok Greg, I wasn't sure locking/unlocking the GIL would create a memory
barrier but it sounds logical after all. Your patch looks fine to me.
|
msg65056 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-04-06 23:14 |
Committed to trunk in revision 62195.
Misc/NEWS entry added.
I also added two new C API functions: PyFile_IncUseCount and
PyFile_DecUseCount along with documentation. They should be used by any
C extension code that uses PyFile_AsFile and wants to make use of the
returned FILE* with the GIL released.
The net effect of not using them is no change from the existing behavior
(crashes would be possible) for those C extension modules.
|
msg71388 - (view) |
Author: Kevin Watters (kevinwatters) |
Date: 2008-08-18 21:48 |
I know this is long closed, but no one on the nosy list happens to have
this fix backported to 2.5, do they? :) If not, I'll attach one here
eventually...
|
msg71389 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-08-18 21:52 |
> I know this is long closed, but no one on the nosy list happens to have
> this fix backported to 2.5, do they? :)
I think that at the time no one was sure the patch was 100% harmless. It
also subtly changes the behaviour of close() when called while another
IO operation is in progress in another thread, which is arguably a bug
fix but can still raise an exception it wouldn't have raised in 2.5.
So all in all I'm not sure this should be backported, although it would
probably be an improvement in most cases. I'll let someone else take the
decision.
|
msg71474 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-08-19 19:57 |
The fix can not be committed to Python 2.5 because it breaks
compatibility by adding another field to the PyFileObject struct and
adding two new C API functions.
|
msg118715 - (view) |
Author: Paul Bolle (pebolle) |
Date: 2010-10-14 20:48 |
0) I ran into some (small) problems with the documentation added by revision 62195. It seems more efficient to reuse this issue to report these. Feel free to ask me to open another issue if that's not appreciated.
1) A small patch that addresses two problems with the current (ie, 2.7) documentation should be attached:
- link three occurrences of "GIL" to the GIL entry in the glossary; and
- add some example code to clarify the usage of PyFile_IncUseCount() andPyFile_DecUseCount().
2) That patch also adds a link to the "Thread State and the Global Interpreter Lock" section to the GIL entry in the Documentation index. That is a separate problem. But fixing that minor problem could also be of help to people (like me) that need to better understand the GIL aspects of those two functions.
|
msg118723 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2010-10-14 22:09 |
Paul, please open a new issue and attach your patch(s) there; it should address issues in 2.7 and 3.2 only.
|
msg118744 - (view) |
Author: Paul Bolle (pebolle) |
Date: 2010-10-15 06:54 |
> please open a new issue and attach your patch(s) there
Issue 10111 now tracks the documentation problems.
|
|
Date |
User |
Action |
Args |
2022-04-10 16:11:30 | admin | set | github: 39342 |
2010-10-15 06:54:52 | pebolle | set | messages:
+ msg118744 |
2010-10-14 22:09:45 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg118723
|
2010-10-14 20:48:58 | pebolle | set | files:
+ python-2.7-Doc-gil.patch nosy:
+ pebolle messages:
+ msg118715
|
2008-08-19 19:57:27 | gregory.p.smith | set | messages:
+ msg71474 |
2008-08-18 21:52:47 | pitrou | set | messages:
+ msg71389 |
2008-08-18 21:48:43 | kevinwatters | set | nosy:
+ kevinwatters messages:
+ msg71388 |
2008-04-06 23:14:16 | gregory.p.smith | set | status: open -> closed resolution: accepted messages:
+ msg65056 versions:
- Python 3.0 |
2008-04-06 11:09:26 | pitrou | set | messages:
+ msg65031 |
2008-04-06 10:40:51 | trent | set | nosy:
+ trent messages:
+ msg65029 |
2008-04-06 06:25:10 | gregory.p.smith | set | files:
+ filethread4-gps01.patch messages:
+ msg65022 |
2008-04-06 03:07:15 | gregory.p.smith | set | assignee: tim.peters -> gregory.p.smith versions:
+ Python 3.0 messages:
+ msg65017 nosy:
+ gregory.p.smith |
2008-03-31 20:48:34 | jyasskin | set | nosy:
+ jyasskin |
2008-03-29 20:39:27 | pitrou | set | files:
+ filethread4.patch messages:
+ msg64724 |
2008-03-29 19:39:33 | pitrou | set | files:
+ filethread3.patch messages:
+ msg64719 |
2008-03-29 01:27:31 | georg.brandl | link | issue595601 superseder |
2008-03-29 01:27:19 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg64667 |
2008-03-29 00:30:35 | pitrou | set | files:
+ filethread2.patch messages:
+ msg64661 |
2008-03-28 22:45:07 | pitrou | set | messages:
+ msg64654 |
2008-03-28 18:47:37 | pitrou | set | messages:
+ msg64645 |
2008-03-28 18:20:44 | pitrou | set | messages:
+ msg64643 |
2008-03-28 18:09:52 | pitrou | set | files:
+ filethread1.patch keywords:
+ patch messages:
+ msg64642 |
2008-03-27 11:00:07 | pitrou | set | nosy:
+ pitrou messages:
+ msg64579 |
2008-01-04 00:06:38 | christian.heimes | set | versions:
- Python 2.3 |
2007-11-09 22:14:34 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg57338 versions:
+ Python 2.6, Python 2.5 |
2007-09-17 10:16:00 | jafo | set | assignee: tim.peters messages:
+ msg55960 nosy:
+ jafo, tim.peters |
2007-08-28 18:34:36 | georg.brandl | set | priority: normal -> critical type: crash severity: normal -> major |
2007-08-28 18:33:45 | georg.brandl | link | issue1778376 superseder |
2003-10-01 06:22:02 | janixia | create | |