classification
Title: file (& socket) I/O are not thread safe
Type: Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: thread unsafe file objects cause crash
View: 815646
Assigned To: Nosy List: aegis, barry, christian.heimes, gregory.p.smith, gvanrossum, jackjansen, jhylton, pitrou, tim.peters
Priority: low Keywords:

Created on 2002-08-15 16:34 by jhylton, last changed 2008-04-06 23:23 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
file-lock.patch jhylton, 2002-08-20 20:49
Messages (13)
msg11992 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-08-15 16:34
We recently found an assertion failure in the universal
newline support when running a multithreaded program
where two threads used the same Python file object. 
The assert(stream != NULL) test in 
Py_UniversalNewlineFread() fails once in a blue moon,
where stream is the stdio FILE * that the fileobject
wraps.  Further analysis suggests that there is a race
condition between checking FILE * and using FILE * that
exists in at least Python 2.1 and up.

I'll actually describe the problem as it exists in
Python 2.2, because it is simpler to avoid the
universal newline code.  That code isn't the source of
the problem, although it's assert() uncovers it in a
clear way.

In file_read() (rev 2.141.6.5), the first thing it does
is check if f_fp (the FILE *) is NULL.  If so it raises
an IOError -- operation on closed file object.  Later,
file_read() enters a for loop that calls fread() until
enough bytes have been read.

	for (;;) {
		Py_BEGIN_ALLOW_THREADS
		errno = 0;
		chunksize = fread(BUF(v) + bytesread, 1,
				  buffersize - bytesread, f->f_fp);
		Py_END_ALLOW_THREADS
		if (chunksize == 0) {
			if (!ferror(f->f_fp))
				break;
			PyErr_SetFromErrno(PyExc_IOError);
			clearerr(f->f_fp);
			Py_DECREF(v);
			return NULL;
		}

The problem is that fread() is called after the global
interpreter lock is released.  Since the lock is
released, another Python thread could run and modify
the file object, changing the value of f->f_fp.  Under
the current interpreter lock scheme, it isn't safe to
use f->f_fp without holding the interpreter lock.

The current file_read() code can fail in a variety of
ways.  It's possible for a second thread to close the
file, which will set f->f_fp to NULL.  Who knows what
fread() will do when NULL is passed.

The universal newline code is squirrels the FILE * in a
local variable, which is worse.  If it happens that
another thread closes the file, at best the local
points to a closed FILE *.  But that memory could get
recycled and then there's no way to know what it points to.

socket I/O has a similar problem with unsafe sharing of
the file descriptor.  However, this problem seems less
severe in general, because we'd just be passing a bogus
file descriptor to a system call.  We don't have to
worry about whether stdio will dump core when passed a
bogus pointer.  There is a chance the a socket will be
closed and its file descriptor used for a different
socket.  So a call to recv() with one socket ends up
using a different socket.  That will be a nightmare to
debug, but it won't cause a segfault.  (And, in
general, files and sockets shouldn't be shared between
application threads unless the application is going to
make sure its safe.)

The solution to this problem is to use a
per-file-object lock to guard access to f->f_fp.  No
thread should read or right f->f_fp without holding the
lock.  To make sure that other threads get a chance to
run when there is contention for the file, the
file-object lock should never be held when the GIL is held.
msg11993 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2002-08-15 20:49
Logged In: YES 
user_id=45365

I'm not sure your solution of locking in the file object is feasible. Every call of PyFile_AsFile() is dangerous, if the  code proceeds to release the GIL before using the FILE* it's in danger. And this code could be in extension modules too (I know there's quite a few in my own img package).

When I did the universal newlines patch I basically ignored the problem (thinking that no-one in their right mind would close a file in one thread while it's still in use in another: how wrong can one be:-).

The only real solution may be to get rid of PyFile_AsFile altogether, and replace it with PyFile_AsFileLocked() and PyFile_Unlock() and make these calls somehow work even when the GIL is held (possibly by temporarily releasing the GIL while acquiring the file lock).
msg11994 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-08-16 15:04
Logged In: YES 
user_id=31392

I think I'm less worried about PyFile_AsFile() than I am
about PyFile_WriteObject().  The latter is invoked by the
print statement.  It extracts the FILE* using
PyFile_AsFile() and proceeds to pass it to PyObject_Print().
 PyObject_Print() MUST hold the GIL, but since it has the
raw FILE* it MUST hold the file lock.  But you can't hold
both locks :-(.
msg11995 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-08-16 15:26
Logged In: YES 
user_id=31435

To cheer you up, it's probably worse than that <wink>.  
While you're holding a file lock, in theory you dare not 
execute any code that may call back into Python, because 
the called code may try to do an operation on the file 
you've got locked, and then the whole system freezes. That 
is, independent of the GIL, it's not thoroughly safe to call 
PyObject_Print with a file lock held, because an object's 
__str__ may try to "do something" with the Python-level file 
object too.  The more depressing news is that a reentrant 
file lock would solve that one.
msg11996 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2002-08-16 22:21
Logged In: YES 
user_id=45365

Uhm... Isn't it good enough if you don't hold the GIL while you acquire the file lock? When I looked at it I thought it would be (but then, I didn't look very hard:-).

And there may be another way around this problem too. If you're really only worried about closing (Python at the moment couldn't care less that when two threads write to the same file at the same time the output may be intermingled or other fun things may happen) then don't lock the file, but simply set a flag "keep your hands off the FILE*, I'm using it!" and in that case delay the close. You may want to protect the flag and the delayed close with a lock, I guess.

Or you could define this as a programmer error and raise an exception if one thread tries to close a file another thread is using. (The application will know much better how to handle the contention with locking or some other mechanism).
msg11997 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2002-08-16 22:43
Logged In: YES 
user_id=12800

Jack, your last paragraph sounds like a good idea <waiting
for Tim to shoot holes in it>.  Isn't the whole point to
avoid core dumps, not to "make it work right" whatever that
means?  We should take the simplest approach to avoiding
crashes.
msg11998 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-08-20 19:43
Logged In: YES 
user_id=6380

Wow. This is harder than I thought! :-(

We should be able to solve whatever problem there is with
'print'. The PyFile_AsFile() API may have to become
obsolete, *or* we may have to add a separate call to access
the lock. I'll have to think about it more.
msg11999 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-08-20 20:49
Logged In: YES 
user_id=31392

Here's a checkpoint of current progress.  The patch applies
cleanly and even compiles.  It works most of the time, but
it causes a bunch of test failures.  I haven't had time to
debug the errors, two likely errors are incorrect
propagation of errors from across the release lock boundary.
 (The error checking goes on inside so that clearerr() can
be called while the file lock is held but
PyErr_SetFromErrno() can be called while the GIL is held.) 
The other source of errors is typos -- like calling LOCK
when you mean UNLOCK.
msg12000 - (view) Author: Chad Austin (aegis) Date: 2006-05-07 11:38
Logged In: YES 
user_id=7212

I'd like to add that this particular problem cost me about a
week of trying to figure out what the heck was going on, a
stack trace thrown from Python is MUCH better than
intermittent last-chance exceptions thrown from our binaries
in the field.  :)

http://aegisknight.livejournal.com/128191.html
msg59321 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-05 19:52
The race condition issue should get more attention.
msg59340 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-05 23:50
Given how long this has been sitting without action, it's ludicrous to
keep it at "high" priority.  I'm also taking it away from Jermy since
his interest must be minimal by now.  And I'm removing 3.0 since it has
a completely different I/O implementation (with much more severe
multi-threading issues, but nothing in common with the 2.x series).
msg64662 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-03-29 00:39
I've posted a preliminary patch for the close()-should-raise-an-error
approach here: #815646
msg65057 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-06 23:23
Fix committed to trunk in revision 62195 via Issue 815646
History
Date User Action Args
2008-04-06 23:23:08gregory.p.smithsetmessages: + msg65057
2008-03-29 01:27:31georg.brandlsetstatus: open -> closed
resolution: duplicate
superseder: thread unsafe file objects cause crash
2008-03-29 00:39:35pitrousetnosy: + pitrou
messages: + msg64662
2008-01-17 01:50:09gregory.p.smithsetnosy: + gregory.p.smith
2008-01-05 23:50:14gvanrossumsetpriority: high -> low
assignee: jhylton ->
messages: + msg59340
versions: - Python 3.0
2008-01-05 19:52:17christian.heimessetpriority: normal -> high
nosy: + christian.heimes
messages: + msg59321
versions: + Python 2.6, Python 3.0
2002-08-15 16:34:16jhyltoncreate