This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jhylton
Recipients
Date 2002-08-15.16:34:16
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
We recently found an assertion failure in the universal
newline support when running a multithreaded program
where two threads used the same Python file object. 
The assert(stream != NULL) test in 
Py_UniversalNewlineFread() fails once in a blue moon,
where stream is the stdio FILE * that the fileobject
wraps.  Further analysis suggests that there is a race
condition between checking FILE * and using FILE * that
exists in at least Python 2.1 and up.

I'll actually describe the problem as it exists in
Python 2.2, because it is simpler to avoid the
universal newline code.  That code isn't the source of
the problem, although it's assert() uncovers it in a
clear way.

In file_read() (rev 2.141.6.5), the first thing it does
is check if f_fp (the FILE *) is NULL.  If so it raises
an IOError -- operation on closed file object.  Later,
file_read() enters a for loop that calls fread() until
enough bytes have been read.

	for (;;) {
		Py_BEGIN_ALLOW_THREADS
		errno = 0;
		chunksize = fread(BUF(v) + bytesread, 1,
				  buffersize - bytesread, f->f_fp);
		Py_END_ALLOW_THREADS
		if (chunksize == 0) {
			if (!ferror(f->f_fp))
				break;
			PyErr_SetFromErrno(PyExc_IOError);
			clearerr(f->f_fp);
			Py_DECREF(v);
			return NULL;
		}

The problem is that fread() is called after the global
interpreter lock is released.  Since the lock is
released, another Python thread could run and modify
the file object, changing the value of f->f_fp.  Under
the current interpreter lock scheme, it isn't safe to
use f->f_fp without holding the interpreter lock.

The current file_read() code can fail in a variety of
ways.  It's possible for a second thread to close the
file, which will set f->f_fp to NULL.  Who knows what
fread() will do when NULL is passed.

The universal newline code is squirrels the FILE * in a
local variable, which is worse.  If it happens that
another thread closes the file, at best the local
points to a closed FILE *.  But that memory could get
recycled and then there's no way to know what it points to.

socket I/O has a similar problem with unsafe sharing of
the file descriptor.  However, this problem seems less
severe in general, because we'd just be passing a bogus
file descriptor to a system call.  We don't have to
worry about whether stdio will dump core when passed a
bogus pointer.  There is a chance the a socket will be
closed and its file descriptor used for a different
socket.  So a call to recv() with one socket ends up
using a different socket.  That will be a nightmare to
debug, but it won't cause a segfault.  (And, in
general, files and sockets shouldn't be shared between
application threads unless the application is going to
make sure its safe.)

The solution to this problem is to use a
per-file-object lock to guard access to f->f_fp.  No
thread should read or right f->f_fp without holding the
lock.  To make sure that other threads get a chance to
run when there is contention for the file, the
file-object lock should never be held when the GIL is held.
History
Date User Action Args
2007-08-23 14:04:57adminlinkissue595601 messages
2007-08-23 14:04:57admincreate