This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author johansen
Recipients johansen
Date 2008-05-30.21:54:06
SpamBayes Score 0.00010243996
Marked as misclassified No
Message-id <1212184451.2.0.27521352304.issue3014@psf.upfronthosting.co.za>
In-reply-to
Content
We're using Python to build the new packaging system for OpenSolaris. 
Yesterday, a user reported that when they ran the pkg command, piped the
output to grep, and then typed ^C, sometimes they'd get this error:

$ pkg list | grep office
^Cclose failed: [Errno 11] Resource temporarily unavailable

We assumed that this might be a problem in the signal handling we've
employed to catch SIGPIPE; however, it turns out that the problem is in
the file_dealloc() code.

For the perversely curious, additional details may be found in the
original bug located here:

http://defect.opensolaris.org/bz/show_bug.cgi?id=2083

Essentially we found the following:

The error message is emitted from fileobject.c: file_dealloc()

The relevant portion of the routine looks like this:

static void
file_dealloc(PyFileObject *f)
{
        int sts = 0;
        if (f->weakreflist != NULL)
                PyObject_ClearWeakRefs((PyObject *) f);
        if (f->f_fp != NULL && f->f_close != NULL) {
                Py_BEGIN_ALLOW_THREADS
                sts = (*f->f_close)(f->f_fp);
                Py_END_ALLOW_THREADS
                if (sts == EOF) 
#ifdef HAVE_STRERROR
                        PySys_WriteStderr("close failed: [Errno %d] %s\n",
errno, strerror(errno)); 

In the cases we encountered, the function pointer f_close is actually a
call to sysmodule.c: _check_and_flush()

That routine looks like this:

static int
_check_and_flush (FILE *stream)
{
  int prev_fail = ferror (stream);
  return fflush (stream) || prev_fail ? EOF : 0;
}

check_and_flush calls ferror(3C) and then fflush(3C) on the FILE stream
associated with the file object.  There's just one problem here.  If it
finds an error that was previously encountered on the file stream,
there's no guarantee that errno will be valid.  Should an error be
encountered in fflush(3C), errno will get set; however, the contents of
errno are undefined should fflush() return successfully.

Here's what happens in the code I observed:

I set a write watchpoint on errno and observed the different times it
was accessed.  After sifting through a bunch of red-herrings, I found
that a call to PyThread_acquire_lock() that sets errno to 11 (EAGAIN). 
This occurs when PyThread_acquire_lock() calls sem_trywait(3C) and finds
the semaphore already locked.  Errno doesn't get accessed again until a
call to libc.so.1`isseekable() that simply saves and restores the
existing errno.

Since we've taken a ^C (SIGINT), the interpreter begins the finalization
process and eventually calls file_dealloc().  This routine calls
_check_and_flush().  In the case that I observed, ferror(3C)
returns a non-zero value but fflush(3C) completes successfully.  This
causes the routine to return EOF to the caller.  file_dealloc() assumes
that since it received an EOF an error occurred and it should call
strerror(errno).  However, since this is just returning the state of a
previous error, errno is invalid.

This is what causes the spurious EAGAIN message.  Just to be sure, I
traced the return value and errno of failed syscalls that were invoked
by the interpreter.  I was unable to observe any syscalls returning
EAGAIN.  This is because (at least on OpenSolaris) sem_trywait(3C) calls
sema_trywait(3C).  The sema_trywait returns EBUSY if the semaphore is
held and sem_trywait converts this to EAGAIN.  None of these errors are
passed out of the kernel.


It's not clear to me whether _check_and_flush(), file_dealloc(), or both
need modification.  At a minimum, it's not safe for file_dealloc() to
assume that errno is set correctly if the function underneath it is
using ferror(3C) to find the presence of an error on the stream.
History
Date User Action Args
2008-05-30 21:54:11johansensetspambayes_score: 0.00010244 -> 0.00010243996
recipients: + johansen
2008-05-30 21:54:11johansensetspambayes_score: 0.00010244 -> 0.00010244
messageid: <1212184451.2.0.27521352304.issue3014@psf.upfronthosting.co.za>
2008-05-30 21:54:10johansenlinkissue3014 messages
2008-05-30 21:54:07johansencreate