classification
Title: file.write doesn't raise IOError when it should
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: jasujm, neologix, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: low Keywords: patch

Created on 2013-05-14 15:10 by jasujm, last changed 2013-12-17 15:33 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
fileobject-fix.patch jasujm, 2013-05-14 15:10 review
fullwrite.c serhiy.storchaka, 2013-05-14 16:39
fileobject-fix2.patch jasujm, 2013-05-14 19:04 review
fileobject-fix3.patch jasujm, 2013-05-14 22:50 review
fileobject-fix4.patch jasujm, 2013-05-15 08:53 review
test_dev_null.py vstinner, 2013-05-17 20:42
fileobject-fix5.patch jasujm, 2013-05-20 09:09 review
fullwrite2.c vstinner, 2013-12-17 14:00
Messages (24)
msg189226 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-14 15:10
file.write doesn't sometimes raise IOError when it should, e.g. writing to /dev/full in line buffered mode:

jaakko@jm-laptop:~$ python
Python 2.7.5+ (2.7:a32a3b79f5e8, May 14 2013, 14:20:11) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open('/dev/full','w',1)
>>> f.write('hello\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 28] No space left on device
>>> f.close()
>>> f=open('/dev/full','w',1)
>>> f.write('hello')
>>> f.write('\n')
>>> f.close()
>>> # No IOError!
... 
>>> 

The current implementation of file.write relies on comparing the return value of fwrite to the expected number of bytes written to detect errors. I haven't dug deep enough into the C standard to know for sure if fwrite return value == expected should always imply no error, but in my example it clearly doesn't (I assume my glibc and fwrite aren't broken though). However using ferror to detect the error works fine and IOError was raised as expected.

Python3 and io module use different implementation where this is no longer an issue. For us still using Python 2.7 I suggest the attached simple patch to fix the issue.
msg189233 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-14 16:26
> I assume my glibc and fwrite aren't broken though

Actually, it's a glibc bug when the last character is a '\n':

$ python -c "f = open('/dev/full', 'w', 1); f.write('hello'); f.close()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 28] No space left on device

Normal.

Now, you add a trailing newline:
$ strace -e write python -c "f = open('/dev/full', 'w', 1); f.write('hello'); f.write('\n'); f.close()"
write(3, "hello\n", 6)                  = -1 ENOSPC (No space left on device)

write() still returns ENOSPC, but it gets ignored by fwrite().

I've had a quick look at the source, and I think the culprit is here:
http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c#l1270

1336       if (do_write)
1337         {
1338           count = new_do_write (f, s, do_write);
1339           to_do -= count;
1340           if (count < do_write)
1341             return n - to_do;
1342         }

It looks like there's a check missing here for count < 0.
msg189235 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-14 16:39
Indeed, fwrite() can return expected number of items and set errno. Here is a simple example on C. An output is:

setvbuf 0 0
fwrite 5 0
fwrite 1 28
fwrite 1 28

On writing "\n" fwrite returns 1 and set errno to ENOSPC.
msg189237 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-14 16:52
> Indeed, fwrite() can return expected number of items and set errno. Here is a simple example on C. An output is:

Yeah, who's volunteering to report it to the glibc?

That's not a python bug, but I feel bad ignoring it.

Note that ferror() isn't reliable (not on my box at least), so we have to use errno directly.
And of course keep the check that the value returned by fwrite matches.
msg189240 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-14 17:11
Yuck. We can of course workaround this (the glibc is commonly used :-)). Why is ferror() not reliable?
msg189241 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-14 17:59
> Why is ferror() not reliable?

Because the glibc doesn't check the errno return code after the
write() syscall, and thus doesn't set the file's stream error flag
(ferror() just checks this flag).

That's what I saw from the code.

I was a little surprised when Jaako says that ferror() is enough to
detect this, so I modified Serhiy code to print ferror(), and actually
ferror() reports an error for subsequent writes, not for the first one
(probably because the error goes unnoticed only when the buffer is in
a particular state).

So in short, errno is the only reliable way to check for errors :-(
msg189246 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-14 19:04
Thank you for your comments.

> I was a little surprised when Jaako says that ferror() is enough to
> detect this, so I modified Serhiy code to print ferror(), and actually
> ferror() reports an error for subsequent writes, not for the first one
> (probably because the error goes unnoticed only when the buffer is in
> a particular state).

Strange. I too modified Serchiy's code and my version of glibc (2.15) set the error flag at the same fwrite call as errno was set:

setvbuf 0 0 0
fwrite 5 0 0
fwrite 1 28 1
fwrite 1 28 1

(the last column being the return value of ferror after each fwrite call)

I've trusted ferror until now but I'm not an expert on the subject. It's a good point that errno is set by the underlying system call and is thus more reliable. So would checking errno be sufficient (in addition to checking that the lengths agree)? It can only serve to make file_write more robust.
msg189251 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-14 22:09
> Strange. I too modified Serchiy's code and my version of glibc (2.15) set the error flag at the same fwrite call as errno was set:
>
> setvbuf 0 0 0
> fwrite 5 0 0
> fwrite 1 28 1
> fwrite 1 28 1
>
> (the last column being the return value of ferror after each fwrite call)

Hum, you're right, I just re-ran the test, and I'm finding the same
thing as you (I must have been dreaming).

I just re-checked the glibc code, and indeed the write() error is
checked, and the error flag is set:
http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c#l1255
1241 _IO_ssize_t
1242 _IO_new_file_write (f, data, n)
[...]
1251       count = (__builtin_expect (f->_flags2
1252                                  & _IO_FLAGS2_NOTCANCEL, 0)
1253                ? write_not_cancel (f->_fileno, data, to_do)
1254                : write (f->_fileno, data, to_do));
1255       if (count < 0)
1256         {
1257           f->_flags |= _IO_ERR_SEEN;
1258           break;
1259         }

But then this value is returned, and depending on the position in the
buffer, this -1 ends up being incremented to what's left to write, and
can result in returning exactly the same size that was requested.

That's definitely a bug, and a bad one since you can get silent
corruption (fclose() won't even return an error in most cases).

Anyway, this means that ferror() should be enough - in addition to
checking that the returned value matches.

> I've trusted ferror until now but I'm not an expert on the subject. It's a good point that errno is set by the underlying system call and is thus more reliable. So would checking errno be sufficient (in addition to checking that the lengths agree)? It can only serve to make file_write more robust.

Yeah, would you like to write a patch?
msg189253 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-14 22:50
> Yeah, would you like to write a patch?

Yes. It's fileobject-fix3.patch attached to this issue record.
msg189278 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-15 09:04
I tried to reply to the review of my last patch but this tracker software itself crashed. Is there anyone who would be interested in the traceback?

Anyway, I'll reply here.

On 2013/05/15 08:37:29, Charles-François Natali wrote:
> http://bugs.python.org/review/17976/diff/8158/Objects/fileobject.c
> File Objects/fileobject.c (right):
> 
> http://bugs.python.org/review/17976/diff/8158/Objects/fileobject.c#newcode1856
> Objects/fileobject.c:1856: if (n2 != n || errno != 0) {
> Hum, we saw that ferror() could be used to detect the error, so it should be
> used instead.
> 
> Also, there's a problem with this patch: it's checking errno too late: for
> example, it could have been cleared by FILE_END_ALLOW_THREADS or Py_XDECREF in
> the meantime.

Ok. However I must point out that if errno is cleared in the meantime, that also affects PyErr_SetFromErrno. I've submitted another patch where I'm extra careful with errno.
msg189281 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-15 10:45
> I tried to reply to the review of my last patch but this tracker
> software itself crashed. Is there anyone who would be interested in
> the traceback?

Never mind. I found the meta tracker and posted my traceback there.
msg189484 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-17 20:42
Attached: Test expressed as an unit test, test_dev_null.py.

The test pass with Python 3 which does not use the "FILE*" API anymore. So you should maybe migrate to Python 3 :-)


$ python3.4 test_dev_null.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.005s

OK


$ python2.7 test_dev_null.py
.F
======================================================================
FAIL: test_write2 (__main__.TestFull)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_dev_null.py", line 13, in test_write2
    self.assertRaises(IOError, self.check_write, 'hello', '\n')
AssertionError: IOError not raised

----------------------------------------------------------------------
Ran 2 tests in 0.002s

FAILED (failures=1)
msg189645 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-05-20 09:09
> The test pass with Python 3 which does not use the "FILE*" API
> anymore. So you should maybe migrate to Python 3 :-)

Yes. I will eventually. But not all the libraries I'm using are migrated yet.
msg206422 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 11:53
LGTM. But err_flag is not needed, valid error numbers are all nonzero.
msg206426 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-17 12:40
New changeset 33c27b76a4d0 by Serhiy Storchaka in branch '2.7':
Issue #17976: Fixed potential problem with file.write() not detecting IO error
http://hg.python.org/cpython/rev/33c27b76a4d0
msg206428 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 12:42
Thank you Jaakko Moisio for your report and patch.
msg206431 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-17 12:54
New changeset 237deaf9ba64 by Serhiy Storchaka in branch '2.7':
Skip test for issue #17976 if /dev/null is not available.
http://hg.python.org/cpython/rev/237deaf9ba64
msg206437 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-17 14:00
I played with fullwrite.c and now think that the fix is incomplete. fwrite() may succeed to write data into the buffer, but you may get the error on fflush() or fclose().

Try attached fullwrite2.c: fwrite() succeed (written=len, errno=result=0), whereas fclose() fails. If you enable fflush(), it will also fail.

Output:
---
fwrite(hello
) => 6 bytes written/6 [errno=0, ferror=0]
fclose() => -1 [errno=28]
---

The complete fix is maybe to write fflush() before fclose(), or at least raise an exception if fclose() returns a non-zero result. Correctly, file.close() returns a number in case of an error...

But now comes the question of backward compatibility, may such change "break" applications? It's maybe a nice thing to "break" applications if it warns users that they are going to loose data (USB key full, cannot write the whole important document!).
msg206438 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-17 14:01
(I was trying to report the issue upstream.)
msg206441 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 14:07
Are you have Python code which exposes a bug?
msg206447 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-17 14:29
> The complete fix is maybe to write fflush() before fclose(), or at least raise an exception if fclose() returns a non-zero result. Correctly, file.close() returns a number in case of an error...

Oh sorry, I missed the line "if (sts == -1) ..." which raises an error.

But I'm still unable to reproduce the glibc bug mentioned by Charles-François :

> Yeah, who's volunteering to report it to the glibc?
msg206452 - (view) Author: Jaakko Moisio (jasujm) Date: 2013-12-17 14:49
The new patch is fine as it is, but my logic behind using err_flag was the following: err_flag was set solely based on the inspection of return value of fwrite and ferror, without referencing to errno. It is of course true that at the same time errno is set to non-zero in any correct C standard library implementation (err_flag != 0 iff err != 0), but as the patch was originally for circumventing a bug in glibc, I decided to be use that extra flag for the purpose.

> But I'm still unable to reproduce the glibc bug mentioned by Charles-François :

Yes. It seems that the bug in glibc has been fixed. But at least Python 2.7 is now a little bit better guarded against exotic file IO bugs that might emerge in C standard libraries :)
msg206461 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-17 15:32
New changeset 24a043355050 by Serhiy Storchaka in branch '2.7':
Circumventing a bug in glibc (issue #17976).
http://hg.python.org/cpython/rev/24a043355050
msg206463 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 15:33
Agree.
History
Date User Action Args
2013-12-17 15:33:22serhiy.storchakasetmessages: + msg206463
2013-12-17 15:32:48python-devsetmessages: + msg206461
2013-12-17 14:49:46jasujmsetmessages: + msg206452
2013-12-17 14:29:42vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg206447
2013-12-17 14:07:24serhiy.storchakasetmessages: + msg206441
2013-12-17 14:01:05vstinnersetmessages: + msg206438
2013-12-17 14:00:42vstinnersetstatus: closed -> open
files: + fullwrite2.c
resolution: fixed -> (no value)
messages: + msg206437
2013-12-17 12:54:17python-devsetmessages: + msg206431
2013-12-17 12:42:19serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg206428

stage: commit review -> resolved
2013-12-17 12:40:53python-devsetnosy: + python-dev
messages: + msg206426
2013-12-17 11:53:52serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg206422
stage: needs patch -> commit review
2013-05-20 09:09:02jasujmsetfiles: + fileobject-fix5.patch

messages: + msg189645
2013-05-17 20:42:59vstinnersetfiles: + test_dev_null.py
nosy: + vstinner
messages: + msg189484

2013-05-15 10:45:01jasujmsetmessages: + msg189281
2013-05-15 09:04:58jasujmsetmessages: + msg189278
2013-05-15 08:53:30jasujmsetfiles: + fileobject-fix4.patch
2013-05-14 22:50:34jasujmsetfiles: + fileobject-fix3.patch

messages: + msg189253
2013-05-14 22:09:27neologixsetmessages: + msg189251
2013-05-14 19:04:38jasujmsetfiles: + fileobject-fix2.patch

messages: + msg189246
2013-05-14 17:59:56neologixsetmessages: + msg189241
2013-05-14 17:11:20pitrousetpriority: normal -> low

messages: + msg189240
stage: needs patch
2013-05-14 16:52:35neologixsetnosy: + pitrou
messages: + msg189237
2013-05-14 16:39:02serhiy.storchakasetfiles: + fullwrite.c
nosy: + serhiy.storchaka
messages: + msg189235

2013-05-14 16:26:07neologixsetnosy: + neologix
messages: + msg189233
2013-05-14 15:10:27jasujmcreate