classification
Title: Added clearerr() to clear EOF state
Type: behavior Stage: resolved
Components: Interpreter Core, IO Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, gpolo, josm, loewis, pitrou, scott.dial
Priority: normal Keywords: needs review, patch

Created on 2007-04-23 17:21 by josm, last changed 2020-05-31 11:55 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
fileobject.diff josm, 2007-04-23 17:21 Objects/fileobject.c patch against trunk review
eof.diff josm, 2007-04-28 17:03 revised patch and its test cases review
eof2.diff josm, 2007-05-03 10:03 Yet another patch review
fileobject.diff scott.dial, 2008-12-22 14:42 Patch against the r67739 on release25-maint review
issue1706039.diff belopolsky, 2008-12-23 15:47 review
issue1706039-py3k.patch scott.dial, 2009-12-10 09:21 review
issue1706039-py3k.patch belopolsky, 2010-07-25 01:11 review
issue1706039-py3k.patch scott.dial, 2011-03-09 07:32 updated to apply cleanly to tip (68339:71d2147ce600)
Messages (39)
msg52505 - (view) Author: John Smith (josm) Date: 2007-04-23 17:21
This patch is to fix bug 1523853.
Now file.read() works as written in the doc, which says
"An empty string is returned when EOF is encountered immediately."
Before this fix, An empty string was returned even when the last
read() encountered EOF.
msg52506 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2007-04-23 20:22
I was not able to reproduce the problem.  I tried to read through the end of the file, append to the file from an external process and read again.  It worked as expected: more data was read.

The proposed patch seems to be redundant: once ferror(f->f_fp) is known to be 0, clearerr will do nothing and in the alternative branch clearerr is already there. 
msg52507 - (view) Author: John Smith (josm) Date: 2007-04-23 23:27
I can easily reproduce this problem on OS X 10.4.9 using Python 2.4.3 and Python 2.5.

Here's the code
#################
import os, sys
filename = "spam"

f = open(filename, "w+")
f.seek(0, 2)

line = f.read() # EOF

# Writing the same file using another fd
f2 = open(filename, "a+")
f2.write("Spam")
f2.flush()
f2.close()

statinfo = os.stat(filename)
print "file size: %d" % statinfo.st_size
print "position : %d" % f.tell()
line = f.read() # read() returns emtpy!! readlines?() works ok
print "line     : [%s]" % line
#################

On my machine, it outputs the following
###
file size: 4
position : 0
line     : []
###

And just now  I found that on ubuntu on the same machine (vmware),
it works collect.
###
file size: 4
position : 0
line     : [Spam]
###

My patched version python outputs the same result as above on OS X.

We need clearerr() there, too because clearerr()'s job is not only only clearing error indicators
but EOF indicators also.
msg52508 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2007-04-24 04:28
Hmm. It looks like glibc's fread will attempt reading even if EOF flag is already set. 

I've reproduced the problem on OS X 10.4.9 and Python 2.6a0 (trunk:54926M).

This patch fixes it.
msg52509 - (view) Author: John Smith (josm) Date: 2007-04-24 17:15
I confirmed that fread can read data from fp in EOF state on Linux. On OS X, freading from fp in EOF state doesn't work,
just return 0.

I tested this behavior using the following code.
----------------------------

#include <stdio.h>
#include <stdlib.h>

main(void)
{
  FILE *fp1, *fp2;
  int n;
  char buf[128];

  fp1 = fopen("spam.txt", "w+");
  fputs("foo", fp1);

  fseek(fp1, 0L, SEEK_END);
  if (fread(buf, 1, sizeof(buf), fp1) == 0)
    fprintf(stderr, "first fread failed\n");

  fp2 = fopen("spam.txt", "a+");
  fputs("bar", fp2);
  fclose(fp2);

  if (feof(fp1))
    fprintf(stderr, "after appending some text, fp1 is still eof\n");

  if (fread(buf, 1, sizeof(buf), fp1) == 0)
    fprintf(stderr, "second fread failed\n");
  printf("buf: %s\n", buf);

  fclose(fp1);

  return 0;
}
----------------------------
=============
On Linux
=============
first fread failed
after appending some text, fp1 is still eof
buf: bar

=============
On OS X
=============
first fread failed
after appending some text, fp1 is still eof
second fread failed
buf: 

Anyway, I think it's safe and preferable to clear EOF indicator in this case.
msg52510 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2007-04-25 03:14
I have just verified that this issue affects readlines and readinto methods on Mac OS X as well.  This tells me that rather than patching all the individual methods, maybe Py_UniversalNewlineFread should be fixed instead.  I believe that the current situation where a function that looks like Python API reports errors by setting flags on a FILE object instead of settin the an exception is against python's design.  I suggest changing Py_UniversalNewlineFgets so that if fread returns 0, it checks whether that was due to EOF or an error and sets an exception in a case of an error and always calls clearerr.
msg52511 - (view) Author: John Smith (josm) Date: 2007-04-28 17:03
New patch Attached.
I believe this one includes all belopolsky's suggestions.

Now all dirty jobs are done in Py_UniversalNewlineFread:
1. calls clearerr if ferror or feof
2. set IOErrorr exception and return

The caller just checks whether IOError is occurred or not
and does appropriate jobs.

The attachment also contains unittest for this problem.

File Added: eof.diff
msg52512 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2007-05-01 14:38
The latest patch works on Mac OS 10.4 . I would suggest a few cosmetic changes:

1. Put the flag checking logic in a separate function and call it instead of fread.  This will eliminate code duplication.
2. "else" is redundant in

+                       if (ferror(stream)) {
+                               clearerr(stream);
+                               PyErr_SetFromErrno(PyExc_IOError);
+                               return 0;
+                       } else if (feof(stream)) {
+                               clearerr(stream);
+                       }

The flow will be more obvious if you write

+                       if (ferror(stream)) {
+                               clearerr(stream);
+                               PyErr_SetFromErrno(PyExc_IOError);
+                               return 0;
+                       } 
+                       if (feof(stream)) {
+                               clearerr(stream);
+                       }

3. Please mention bug 1523853 in a comment near the test.  

otherwise, the patch looks fine.
msg52513 - (view) Author: John Smith (josm) Date: 2007-05-03 10:03
The flag checking is revised.
I removed feof() so the checking logic become shorter.
I didn't factor out that logic 
because I think the logic is simple enough

Added a comment that mentions bug 1523853.


File Added: eof2.diff
msg52514 - (view) Author: John Smith (josm) Date: 2007-06-05 14:56
Could someone please review my patch and give me some feedback?
msg58043 - (view) Author: John Smith (josm) Date: 2007-12-01 12:19
Any chance to get this fix commmited in?
msg58134 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-12-03 17:47
I get an error when I try to read 1523853. Is this not moved to the new
tracker or others are able to access it?
msg58137 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-12-03 18:05
Yes, a number of items were not moved, as SF failed to provide them on
export.
msg58181 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-12-04 09:02
jos, can you please provide a real name in "Your Details" of this
tracker? We cannot accept anonymous/pseudonymous patches.
msg58185 - (view) Author: John Smith (josm) Date: 2007-12-04 13:02
What's in a name? :p
Done, anyway.
msg76622 - (view) Author: John Smith (josm) Date: 2008-11-29 19:44
Hi, when will this patch be checkedin?
msg77748 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-13 15:37
Thanks for the patch. Committed into the 2.5 branch as r67740. This
still needs to be applied to the other branches.
msg78161 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-21 23:35
Unfortunately, the patch is broken. The program

fname='test123'
f=open(fname,'w')
f.read()

crashes with the patch applied.

I think I will revert the patch in 2.5.3, release 2.5.4, and reject the
patch.
msg78162 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-21 23:42
It isn't being careful when calling PyErr_SetFromErrno inside the
Py_UniversalNewlineFread function since this function is being called
all over fileobject after releasing the GIL.. so, isn't this just a
matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
these calls to PyErr_SetFromErrno in this specific function ?
msg78163 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-21 23:52
> It isn't being careful when calling PyErr_SetFromErrno inside the
> Py_UniversalNewlineFread function since this function is being called
> all over fileobject after releasing the GIL.. so, isn't this just a
> matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
> these calls to PyErr_SetFromErrno in this specific function ?

Perhaps that could fix this problem (or perhaps not - is
PyGILState_Ensure guaranteed to do the right thing, even in the
presence of multiple interpreters?)

My concern is that this patch may contain other bugs. I am apparently
incapable of reviewing it correctly, and nobody else has offered a
careful review.
msg78164 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-21 23:55
On Sun, Dec 21, 2008 at 9:52 PM, Martin v. Löwis <report@bugs.python.org> wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> It isn't being careful when calling PyErr_SetFromErrno inside the
>> Py_UniversalNewlineFread function since this function is being called
>> all over fileobject after releasing the GIL.. so, isn't this just a
>> matter of adding pairs of PyGILState_Ensure/PyGILState_Release around
>> these calls to PyErr_SetFromErrno in this specific function ?
>
> Perhaps that could fix this problem (or perhaps not - is
> PyGILState_Ensure guaranteed to do the right thing, even in the
> presence of multiple interpreters?)

It is said to be unsupported in that case, but I guess you knew that.
msg78187 - (view) Author: Scott Dial (scott.dial) Date: 2008-12-22 14:42
I believe the original patch is overreaching. I believe the changes to
fileobject.c should've been much simpler, just as the title describes
(adding clearerr() just before performing reads on the FILE handles).

I've attached a much simpler patch that I believe corrects the issue,
but I don't have an OS X platform to try it on.
msg78238 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-23 15:07
I have now reverted the patch in r67914. I won't reject the patch
because of Scott's alternative, but leave it open for review. Since
Scott's patch is not approved yet, this is not a release blocker anymore.
msg78239 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-12-23 15:47
I've combined Scott's patch with John's (?) test case in issue1706039.diff.  Confirmed that the patch fixes the issue on Mac OS X 
10.5.6 (Intel).
msg78240 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-12-23 16:03
Maybe not a problem, but an inconsistency: in Py_UniversalNewlineFgets 
clearerr is added inside univ_newline conditional and then again before 
the loop over characters, but in Py_UniversalNewlineFread it is added only 
once before "if (!f->f_univ_newline)".  I am not sure which approach is 
right, but I don't think there is a reason for the two functions to be 
different.
msg78242 - (view) Author: John Smith (josm) Date: 2008-12-23 17:05
Sorry for inconvenience caused with the patch.
I should've got more reviews before asking it checked in.

I verified eof2.diff-applied-python gets "Bus error"
and Scott Dial's fileobject.diff fixes this.
(Test was on 9.6.0 Darwin Kernel Version 9.6.0)
msg78243 - (view) Author: Scott Dial (scott.dial) Date: 2008-12-23 17:13
They differ because in Py_UniversalNewlineFgets() there is a call to
FLOCKFILE(), and it seemed appropriate to ensure that clearerr() was
called after locking the FILE stream. I certainly pondered over whether
it made a difference to do it before or after, and it would seem to me
that if we are bothering to lock the FILE stream then we would care to
ensure that the first GETC() after locking reaps the benefit of having
called clearerr(). In the cases that we fall into fread and fgets, we
may actually need to be doing:

    FLOCKFILE(stream)
    clearerr(stream);
    result = fgets(buf, n, stream);
    FUNLOCKFILE(stream);
    return result;

One nitpick is that the testcase was appended to the StdoutTests class
instead of OtherFileTests class, which would seem more appropriate.
msg87815 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 12:55
Does it apply to 3.1?
msg96196 - (view) Author: Scott Dial (scott.dial) Date: 2009-12-10 09:18
I've attached a patch that applies cleanly against py3k.

I do not have an in-depth understanding of the new io module, but at a
cursory glance, it seems to not use FILE streams, and would not be
subject to this bug. However, Py_UniversalNewlineFgets() still remains,
and I have patched it accordingly. Additionally, I have added the test
to test_fileio.

py3k doesn't fail any of these tests for me, but the original issue was
only exposed on OS X.
msg110586 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-17 17:48
The patch adds one line and moves one line in fileobject.c.  The rest of the patch is new tests.  I've successfully applied the patch on Windows and the tests were fine.  Can someone kindly repeat the tests on OS X as this is where the problem was originally reported.
msg111502 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-24 18:59
Could somebody with an OS X system please try out the py3k patch, thanks.
msg111516 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-25 01:11
The patch passes tests on OSX, but the old code passes the new tests as well, so I am not sure what this patch fixes anymore.  Looking back through the messages I see that I had trouble reproducing the problem, then I succeeded, but I don't remember the details.

In any case, I am replacing issue1706039-py3k.patch with the same patch against recent SVN revision and with tabs replaced by spaces in C code.
msg122920 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-30 17:12
I don't know how to reproduce the issue and without unit tests this patch cannot be committed.
msg122922 - (view) Author: Scott Dial (scott.dial) Date: 2010-11-30 17:26
The patch includes unittests; the issue is that the tests pass without the changes. That is an entirely different state to be in. The tests should be committed unless somebody wants to object to the behavior that they are testing (although I believe this behavior is already codified in documentation). If the tests start failing on someone's platform, then you have a patch already waiting to be applied. Nevertheless, I don't see the harm in the patch as-is, because it is a quite innocuous change to Py_UniversalNewlineFgets(), unless (again) you want to argue about the correctness of that change despite no test failing.
msg123669 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-09 05:45
The tests need to be cleaned up a little.  The setup code should go to setUp() method and instead of calling different methods in a loop with a switch over method names, it should just have a separate test_ method for each method tested.  A "with" statement can also reduce some boilerplate.
msg130430 - (view) Author: Scott Dial (scott.dial) Date: 2011-03-09 07:32
I've updated the patch to apply to the current tip. (This patch was an opportunity for me to update to an Hg workflow.)

Alexander, I disagree with you about the tests. The unittests use the exact same pattern/model that testIteration uses. I find your complaint that a unittest, which does test the feature under question, is not good enough, despite the prevailing unittests being designed in the same manner, to be absurd. Practicality beats purity -- a test that works is better than no test at all.

By rejecting unittests on the merits of its coding style, you are creating a double-standard for people like me (outside of the core committers), which eventually wears out my interest in helping you get this improvement into your project. I have been chased around this obstacle course before. For example, issue5949, when I was asked to provide unittests for a module that had *none*, for a 3-line patch that nobody disagreed it being correct. I had a vested interest in jumping through the obstacles of getting that patch in, but here I am again being blocked from making a 3-line patch, except this time, purely for stylistic reasons.
msg130472 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-09 22:44
I'm afraid there's a misunderstanding here about the scope of the issue. fileobject.c is not really used in py3k (except for a couple very specific uses such as the tokenizer or the import machinery).
The standard I/O stack uses object in the _io module (that is, in Modules/_io/*.c).
That's also why you won't see the tests failing, even without the patch...
msg130481 - (view) Author: Scott Dial (scott.dial) Date: 2011-03-10 02:09
I'm well aware of the limited use of Py_UniversalNewlineFgets() in py3k, but it remains the case that it is a public API that fails to work correctly under the conditions specified by the reporter, and Alexander confirmed the original patch fixed the issue. AFAICT, there no longer are any test cases (beyond the indirect testing of the dependent code) for Py_UniversalNewlineFgets().

One of the issues with applying the patch to tip was due to commenting out some code in it:

     /* if ( c == EOF && skipnextlf )
         newlinetypes |= NEWLINE_CR; */

For issue8914, which really should've just deleted those two lines, but there was not such an extensive review of that change.

The unittests have there own merit. The file object API is supposed to behave in the manner that the tests exercise them. There are currently no tests that would inform us if any change broke this documented behavior. If you want to split the patch in two to treat them as independent items, then fine.

Otherwise, close the issue as WONTFIX due to obsolescence.
msg130581 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-03-11 15:56
On Wed, Mar 9, 2011 at 2:32 AM, Scott Dial <report@bugs.python.org> wrote:
..
> By rejecting unittests on the merits of its coding style, you are creating a double-standard for people like
> me (outside of the core committers), which eventually wears out my interest in helping you get this
> improvement into your project.

I don't think there is a double-standard.  What you see is more of an
evolving standard in which we seek higher quality for the new code
than that of the code already in.  The readability of the test code is
as important if not more important than the readability of the main
code.  It may be more important because in some cases you can trust
obscure code if it is clear that tests cover all corner cases and are
easy to understand.

I don't think there are overly strict standards for how unit tests
should be written, but when they are written in a style that is
unfamiliar to the reviewer, it makes it harder to review.  Note that a
reviewer needs to worry about tests passing on multiple platforms,
their behavior in failure scenarios, as well as other issues that
contributors typically overlook.  Stylistic issues often point out to
real problems, but are easier to spot.  For example, try/finally is
more error prone than with statement.   In fact, in your test
"append" stream will not get closed if writing fails.  You may think
the resource leak on failure is not a big deal, but for some runners
that execute tests repeatedly this may lead to spurious failures.

The acceptance rate for a patch is proportional to the severity of an
issue that it fixes and obviousness of the fix (including tests.)
This patch fails on both counts.  Looking at the history of the issue,
I see that the patch was applied once but reverted because it caused a
crash.  See msg78161.  This shows that the tests were not sufficiently
thorough.  Next, a patch was suggested with tests that did not fail
without a patch.  See msg111516.  This probably shows that additional
tests are needed for 3.2.

These are the reasons I unassigned this issue.  In my opinion it would
take a committer too much effort to bring the patch to commit ready
shape for the marginal benefit of fixing a platform dependent corner
case.  If the tests were better written so that it would be more
obvious what is fixed and that nothing gets broken, my calculation
would probably be different.
History
Date User Action Args
2020-05-31 11:55:40serhiy.storchakasetstatus: open -> closed
resolution: out of date
stage: patch review -> resolved
2011-03-11 15:56:21belopolskysetnosy: loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: + msg130581
2011-03-10 02:09:07scott.dialsetnosy: loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: + msg130481
2011-03-09 22:44:30pitrousetnosy: loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
versions: - Python 3.1, Python 3.2
messages: + msg130472
stage: test needed -> patch review
2011-03-09 07:32:22scott.dialsetfiles: + issue1706039-py3k.patch
nosy: loewis, belopolsky, scott.dial, pitrou, josm, benjamin.peterson, gpolo
messages: + msg130430
2010-12-09 05:45:44belopolskysetnosy: - BreamoreBoy
messages: + msg123669
2010-11-30 17:26:06scott.dialsetmessages: + msg122922
2010-11-30 17:12:47belopolskysetassignee: belopolsky ->
messages: + msg122920
2010-07-25 01:11:11belopolskysetfiles: + issue1706039-py3k.patch
messages: + msg111516

keywords: + needs review
resolution: accepted -> (no value)
stage: commit review -> test needed
2010-07-24 18:59:06BreamoreBoysetmessages: + msg111502
2010-07-17 21:30:23belopolskysetassignee: belopolsky
resolution: accepted
stage: patch review -> commit review
2010-07-17 17:48:10BreamoreBoysetnosy: + BreamoreBoy

messages: + msg110586
versions: + Python 2.7, Python 3.2, - Python 2.6
2009-12-10 09:21:52scott.dialsetfiles: - issue1706039-py3k.patch
2009-12-10 09:21:48scott.dialsetfiles: + issue1706039-py3k.patch
2009-12-10 09:18:22scott.dialsetfiles: + issue1706039-py3k.patch

messages: + msg96196
2009-05-15 12:55:17pitrousetmessages: + msg87815
2009-05-15 02:10:20ajaksu2setversions: + Python 3.1, - Python 3.0
nosy: + benjamin.peterson, pitrou

priority: normal
components: + IO
stage: patch review
2008-12-23 17:13:22scott.dialsetmessages: + msg78243
2008-12-23 17:05:12josmsetmessages: + msg78242
2008-12-23 16:03:25belopolskysetmessages: + msg78240
2008-12-23 15:47:23belopolskysetfiles: + issue1706039.diff
messages: + msg78239
2008-12-23 15:07:50loewissetpriority: release blocker -> (no value)
assignee: loewis -> (no value)
messages: + msg78238
2008-12-22 14:42:40scott.dialsetfiles: + fileobject.diff
nosy: + scott.dial
messages: + msg78187
2008-12-21 23:55:25gpolosetmessages: + msg78164
2008-12-21 23:52:12loewissetmessages: + msg78163
2008-12-21 23:42:45gpolosetnosy: + gpolo
messages: + msg78162
2008-12-21 23:35:21loewissetmessages: + msg78161
2008-12-20 14:37:04loewissetversions: - Python 2.5.3
2008-12-20 02:41:45loewissetpriority: deferred blocker -> release blocker
2008-12-13 15:37:23loewissetpriority: release blocker -> deferred blocker
messages: + msg77748
2008-12-10 09:10:00loewissetpriority: normal -> release blocker
2008-11-29 19:44:48josmsetmessages: + msg76622
versions: + Python 2.6, Python 3.0, Python 2.5.3
2008-01-21 21:36:44draghuramsetnosy: - draghuram
2007-12-04 13:02:06josmsetmessages: + msg58185
2007-12-04 09:02:08loewissetmessages: + msg58181
2007-12-03 18:05:17loewissetnosy: + loewis
messages: + msg58137
2007-12-03 17:47:56draghuramsetnosy: + draghuram
messages: + msg58134
2007-12-01 12:19:34josmsettype: behavior
messages: + msg58043
2007-08-24 19:39:00georg.brandlsetassignee: loewis
2007-04-23 17:21:41josmcreate