classification
Title: Added clearerr() to clear EOF state
Type: behavior Stage: patch review
Components: Interpreter Core, IO Versions: Python 3.1, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, gpolo, josm, loewis, pitrou, scott.dial (7)
Priority: normal Keywords patch

Created on 2007-04-23 17:21 by josm, last changed 2009-05-15 12:55 by pitrou.

Files
File name Uploaded Description Edit Remove
fileobject.diff josm, 2007-04-23 17:21 Objects/fileobject.c patch against trunk
eof.diff josm, 2007-04-28 17:03 revised patch and its test cases
eof2.diff josm, 2007-05-03 10:03 Yet another patch
fileobject.diff scott.dial, 2008-12-22 14:42 Patch against the r67739 on release25-maint
issue1706039.diff belopolsky, 2008-12-23 15:47
Messages (28)
msg52505 - (view) Author: John Smith (josm) Date: 2007-04-23 17:21
This patch is to fix bug 1523853.
Now file.read() works as written in the doc, which says
"An empty string is returned when EOF is encountered immediately."
Before this fix, An empty string was returned even when the last
read() encountered EOF.
msg52506 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2007-04-23 20:22
I was not able to reproduce the problem.  I tried to read through the end of the file, append to the file from an external process and read again.  It worked as expected: more data was read.

The proposed patch seems to be redundant: once ferror(f->f_fp) is known to be 0, clearerr will do nothing and in the alternative branch clearerr is already there. 
msg52507 - (view) Author: John Smith (josm) Date: 2007-04-23 23:27
I can easily reproduce this problem on OS X 10.4.9 using Python 2.4.3 and Python 2.5.

Here's the code
#################
import os, sys
filename = "spam"

f = open(filename, "w+")
f.seek(0, 2)

line = f.read() # EOF

# Writing the same file using another fd
f2 = open(filename, "a+")
f2.write("Spam")
f2.flush()
f2.close()

statinfo = os.stat(filename)
print "file size: %d" % statinfo.st_size
print "position : %d" % f.tell()
line = f.read() # read() returns emtpy!! readlines?() works ok
print "line     : [%s]" % line
#################

On my machine, it outputs the following
###
file size: 4
position : 0
line     : []
###

And just now  I found that on ubuntu on the same machine (vmware),
it works collect.
###
file size: 4
position : 0
line     : [Spam]
###

My patched version python outputs the same result as above on OS X.

We need clearerr() there, too because clearerr()'s job is not only only clearing error indicators
but EOF indicators also.
msg52508 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2007-04-24 04:28
Hmm. It looks like glibc's fread will attempt reading even if EOF flag is already set. 

I've reproduced the problem on OS X 10.4.9 and Python 2.6a0 (trunk:54926M).

This patch fixes it.
msg52509 - (view) Author: John Smith (josm) Date: 2007-04-24 17:15
I confirmed that fread can read data from fp in EOF state on Linux. On OS X, freading from fp in EOF state doesn't work,
just return 0.

I tested this behavior using the following code.
----------------------------

#include <stdio.h>
#include <stdlib.h>

main(void)
{
  FILE *fp1, *fp2;
  int n;
  char buf[128];

  fp1 = fopen("spam.txt", "w+");
  fputs("foo", fp1);

  fseek(fp1, 0L, SEEK_END);
  if (fread(buf, 1, sizeof(buf), fp1) == 0)
    fprintf(stderr, "first fread failed\n");

  fp2 = fopen("spam.txt", "a+");
  fputs("bar", fp2);
  fclose(fp2);

  if (feof(fp1))
    fprintf(stderr, "after appending some text, fp1 is still eof\n");

  if (fread(buf, 1, sizeof(buf), fp1) == 0)
    fprintf(stderr, "second fread failed\n");
  printf("buf: %s\n", buf);

  fclose(fp1);

  return 0;
}
----------------------------
=============
On Linux
=============
first fread failed
after appending some text, fp1 is still eof
buf: bar

=============
On OS X
=============
first fread failed
after appending some text, fp1 is still eof
second fread failed
buf: 

Anyway, I think it's safe and preferable to clear EOF indicator in this case.
msg52510 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2007-04-25 03:14
I have just verified that this issue affects readlines and readinto methods on Mac OS X as well.  This tells me that rather than patching all the individual methods, maybe Py_UniversalNewlineFread should be fixed instead.  I believe that the current situation where a function that looks like Python API reports errors by setting flags on a FILE object instead of settin the an exception is against python's design.  I suggest changing Py_UniversalNewlineFgets so that if fread returns 0, it checks whether that was due to EOF or an error and sets an exception in a case of an error and always calls clearerr.
msg52511 - (view) Author: John Smith (josm) Date: 2007-04-28 17:03
New patch Attached.
I believe this one includes all belopolsky's suggestions.

Now all dirty jobs are done in Py_UniversalNewlineFread:
1. calls clearerr if ferror or feof
2. set IOErrorr exception and return

The caller just checks whether IOError is occurred or not
and does appropriate jobs.

The attachment also contains unittest for this problem.

File Added: eof.diff
msg52512 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2007-05-01 14:38
The latest patch works on Mac OS 10.4 . I would suggest a few cosmetic changes:

1. Put the flag checking logic in a separate function and call it instead of fread.  This will eliminate code duplication.
2. "else" is redundant in

+                       if (ferror(stream)) {
+                               clearerr(stream);
+                               PyErr_SetFromErrno(PyExc_IOError);
+                               return 0;
+                       } else if (feof(stream)) {
+                               clearerr(stream);
+                       }

The flow will be more obvious if you write

+                       if (ferror(stream)) {
+                               clearerr(stream);
+                               PyErr_SetFromErrno(PyExc_IOError);
+                               return 0;
+                       } 
+                       if (feof(stream)) {
+                               clearerr(stream);
+                       }

3. Please mention bug 1523853 in a comment near the test.  

otherwise, the patch looks fine.
msg52513 - (view) Author: John Smith (josm) Date: 2007-05-03 10:03
The flag checking is revised.
I removed feof() so the checking logic become shorter.
I didn't factor out that logic 
because I think the logic is simple enough

Added a comment that mentions bug 1523853.


File Added: eof2.diff
msg52514 - (view) Author: John Smith (josm) Date: 2007-06-05 14:56
Could someone please review my patch and give me some feedback?
msg58043 - (view) Author: John Smith (josm) Date: 2007-12-01 12:19
Any chance to get this fix commmited in?
msg58134 - (view) Author: Raghuram Devarakonda (draghuram) Date: 2007-12-03 17:47
I get an error when I try to read 1523853. Is this not moved to the new
tracker or others are able to access it?
msg58137 - (view) Author: Martin v. Löwis (loewis) Date: 2007-12-03 18:05
Yes, a number of items were not moved, as SF failed to provide them on
export.
msg58181 - (view) Author: Martin v. Löwis (loewis) Date: 2007-12-04 09:02
jos, can you please provide a real name in "Your Details" of this
tracker? We cannot accept anonymous/pseudonymous patches.
msg58185 - (view) Author: John Smith (josm) Date: 2007-12-04 13:02
What's in a name? :p
Done, anyway.
msg76622 - (view) Author: John Smith (josm) Date: 2008-11-29 19:44
Hi, when will this patch be checkedin?
msg77748 - (view) Author: Martin v. Löwis (loewis) Date: 2008-12-13 15:37
Thanks for the patch. Committed into the 2.5 branch as r67740. This
still needs to be applied to the other branches.
msg78161 - (view) Author: Martin v. Löwis (loewis) Date: 2008-12-21 23:35
Unfortunately, the patch is broken. The program

fname='test123'
f=open(fname,'w')
f.read()

crashes with the patch applied.

I think I will revert the patch in 2.5.3, release 2.5.4, and reject the
patch.
msg78162 - (view) Author: Guilherme Polo (gpolo) Date: 2008-12-21 23:42
It isn't being careful when calling PyErr_SetFromErrno inside the
Py_UniversalNewlineFread function since this function is being called
all over fileobject after releasing the GIL.. so, isn't this just a
matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
these calls to PyErr_SetFromErrno in this specific function ?
msg78163 - (view) Author: Martin v. Löwis (loewis) Date: 2008-12-21 23:52
> It isn't being careful when calling PyErr_SetFromErrno inside the
> Py_UniversalNewlineFread function since this function is being called
> all over fileobject after releasing the GIL.. so, isn't this just a
> matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
> these calls to PyErr_SetFromErrno in this specific function ?

Perhaps that could fix this problem (or perhaps not - is
PyGILState_Ensure guaranteed to do the right thing, even in the
presence of multiple interpreters?)

My concern is that this patch may contain other bugs. I am apparently
incapable of reviewing it correctly, and nobody else has offered a
careful review.
msg78164 - (view) Author: Guilherme Polo (gpolo) Date: 2008-12-21 23:55
On Sun, Dec 21, 2008 at 9:52 PM, Martin v. Löwis <report@bugs.python.org> wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> It isn't being careful when calling PyErr_SetFromErrno inside the
>> Py_UniversalNewlineFread function since this function is being called
>> all over fileobject after releasing the GIL.. so, isn't this just a
>> matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
>> these calls to PyErr_SetFromErrno in this specific function ?
>
> Perhaps that could fix this problem (or perhaps not - is
> PyGILState_Ensure guaranteed to do the right thing, even in the
> presence of multiple interpreters?)

It is said to be unsupported in that case, but I guess you knew that.
msg78187 - (view) Author: Scott Dial (scott.dial) Date: 2008-12-22 14:42
I believe the original patch is overreaching. I believe the changes to
fileobject.c should've been much simpler, just as the title describes
(adding clearerr() just before performing reads on the FILE handles).

I've attached a much simpler patch that I believe corrects the issue,
but I don't have an OS X platform to try it on.
msg78238 - (view) Author: Martin v. Löwis (loewis) Date: 2008-12-23 15:07
I have now reverted the patch in r67914. I won't reject the patch
because of Scott's alternative, but leave it open for review. Since
Scott's patch is not approved yet, this is not a release blocker anymore.
msg78239 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-12-23 15:47
I've combined Scott's patch with John's (?) test case in issue1706039.diff.  Confirmed that the patch fixes the issue on Mac OS X 
10.5.6 (Intel).
msg78240 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-12-23 16:03
Maybe not a problem, but an inconsistency: in Py_UniversalNewlineFgets 
clearerr is added inside univ_newline conditional and then again before 
the loop over characters, but in Py_UniversalNewlineFread it is added only 
once before "if (!f->f_univ_newline)".  I am not sure which approach is 
right, but I don't think there is a reason for the two functions to be 
different.
msg78242 - (view) Author: John Smith (josm) Date: 2008-12-23 17:05
Sorry for inconvenience caused with the patch.
I should've got more reviews before asking it checked in.

I verified eof2.diff-applied-python gets "Bus error"
and Scott Dial's fileobject.diff fixes this.
(Test was on 9.6.0 Darwin Kernel Version 9.6.0)
msg78243 - (view) Author: Scott Dial (scott.dial) Date: 2008-12-23 17:13
They differ because in Py_UniversalNewlineFgets() there is a call to
FLOCKFILE(), and it seemed appropriate to ensure that clearerr() was
called after locking the FILE stream. I certainly pondered over whether
it made a difference to do it before or after, and it would seem to me
that if we are bothering to lock the FILE stream then we would care to
ensure that the first GETC() after locking reaps the benefit of having
called clearerr(). In the cases that we fall into fread and fgets, we
may actually need to be doing:

    FLOCKFILE(stream)
    clearerr(stream);
    result = fgets(buf, n, stream);
    FUNLOCKFILE(stream);
    return result;

One nitpick is that the testcase was appended to the StdoutTests class
instead of OtherFileTests class, which would seem more appropriate.
msg87815 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-05-15 12:55
Does it apply to 3.1?
History
Date User Action Args
2009-05-15 12:55:17pitrousetmessages: + msg87815
2009-05-15 02:10:20ajaksu2setversions: + Python 3.1, - Python 3.0
nosy: + benjamin.peterson, pitrou

priority: normal
components: + IO
stage: patch review
2008-12-23 17:13:22scott.dialsetmessages: + msg78243
2008-12-23 17:05:12josmsetmessages: + msg78242
2008-12-23 16:03:25belopolskysetmessages: + msg78240
2008-12-23 15:47:23belopolskysetfiles: + issue1706039.diff
messages: + msg78239
2008-12-23 15:07:50loewissetpriority: release blocker -> (no value)
assignee: loewis ->
messages: + msg78238
2008-12-22 14:42:40scott.dialsetfiles: + fileobject.diff
nosy: + scott.dial
messages: + msg78187
2008-12-21 23:55:25gpolosetmessages: + msg78164
2008-12-21 23:52:12loewissetmessages: + msg78163
2008-12-21 23:42:45gpolosetnosy: + gpolo
messages: + msg78162
2008-12-21 23:35:21loewissetmessages: + msg78161
2008-12-20 14:37:04loewissetversions: - Python 2.5.3
2008-12-20 02:41:45loewissetpriority: deferred blocker -> release blocker
2008-12-13 15:37:23loewissetpriority: release blocker -> deferred blocker
messages: + msg77748
2008-12-10 09:10:00loewissetpriority: normal -> release blocker
2008-11-29 19:44:48josmsetmessages: + msg76622
versions: + Python 2.6, Python 3.0, Python 2.5.3
2008-01-21 21:36:44draghuramsetnosy: - draghuram
2007-12-04 13:02:06josmsetmessages: + msg58185
2007-12-04 09:02:08loewissetmessages: + msg58181
2007-12-03 18:05:17loewissetnosy: + loewis
messages: + msg58137
2007-12-03 17:47:56draghuramsetnosy: + draghuram
messages: + msg58134
2007-12-01 12:19:34josmsettype: behavior
messages: + msg58043
2007-08-24 19:39:00georg.brandlsetassignee: loewis
2007-04-23 17:21:41josmcreate