msg189226 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-14 15:10 |
file.write doesn't sometimes raise IOError when it should, e.g. writing to /dev/full in line buffered mode:
jaakko@jm-laptop:~$ python
Python 2.7.5+ (2.7:a32a3b79f5e8, May 14 2013, 14:20:11)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open('/dev/full','w',1)
>>> f.write('hello\n')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 28] No space left on device
>>> f.close()
>>> f=open('/dev/full','w',1)
>>> f.write('hello')
>>> f.write('\n')
>>> f.close()
>>> # No IOError!
...
>>>
The current implementation of file.write relies on comparing the return value of fwrite to the expected number of bytes written to detect errors. I haven't dug deep enough into the C standard to know for sure if fwrite return value == expected should always imply no error, but in my example it clearly doesn't (I assume my glibc and fwrite aren't broken though). However using ferror to detect the error works fine and IOError was raised as expected.
Python3 and io module use different implementation where this is no longer an issue. For us still using Python 2.7 I suggest the attached simple patch to fix the issue.
|
msg189233 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-14 16:26 |
> I assume my glibc and fwrite aren't broken though
Actually, it's a glibc bug when the last character is a '\n':
$ python -c "f = open('/dev/full', 'w', 1); f.write('hello'); f.close()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 28] No space left on device
Normal.
Now, you add a trailing newline:
$ strace -e write python -c "f = open('/dev/full', 'w', 1); f.write('hello'); f.write('\n'); f.close()"
write(3, "hello\n", 6) = -1 ENOSPC (No space left on device)
write() still returns ENOSPC, but it gets ignored by fwrite().
I've had a quick look at the source, and I think the culprit is here:
http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c#l1270
1336 if (do_write)
1337 {
1338 count = new_do_write (f, s, do_write);
1339 to_do -= count;
1340 if (count < do_write)
1341 return n - to_do;
1342 }
It looks like there's a check missing here for count < 0.
|
msg189235 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-05-14 16:39 |
Indeed, fwrite() can return expected number of items and set errno. Here is a simple example on C. An output is:
setvbuf 0 0
fwrite 5 0
fwrite 1 28
fwrite 1 28
On writing "\n" fwrite returns 1 and set errno to ENOSPC.
|
msg189237 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-14 16:52 |
> Indeed, fwrite() can return expected number of items and set errno. Here is a simple example on C. An output is:
Yeah, who's volunteering to report it to the glibc?
That's not a python bug, but I feel bad ignoring it.
Note that ferror() isn't reliable (not on my box at least), so we have to use errno directly.
And of course keep the check that the value returned by fwrite matches.
|
msg189240 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-05-14 17:11 |
Yuck. We can of course workaround this (the glibc is commonly used :-)). Why is ferror() not reliable?
|
msg189241 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-14 17:59 |
> Why is ferror() not reliable?
Because the glibc doesn't check the errno return code after the
write() syscall, and thus doesn't set the file's stream error flag
(ferror() just checks this flag).
That's what I saw from the code.
I was a little surprised when Jaako says that ferror() is enough to
detect this, so I modified Serhiy code to print ferror(), and actually
ferror() reports an error for subsequent writes, not for the first one
(probably because the error goes unnoticed only when the buffer is in
a particular state).
So in short, errno is the only reliable way to check for errors :-(
|
msg189246 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-14 19:04 |
Thank you for your comments.
> I was a little surprised when Jaako says that ferror() is enough to
> detect this, so I modified Serhiy code to print ferror(), and actually
> ferror() reports an error for subsequent writes, not for the first one
> (probably because the error goes unnoticed only when the buffer is in
> a particular state).
Strange. I too modified Serchiy's code and my version of glibc (2.15) set the error flag at the same fwrite call as errno was set:
setvbuf 0 0 0
fwrite 5 0 0
fwrite 1 28 1
fwrite 1 28 1
(the last column being the return value of ferror after each fwrite call)
I've trusted ferror until now but I'm not an expert on the subject. It's a good point that errno is set by the underlying system call and is thus more reliable. So would checking errno be sufficient (in addition to checking that the lengths agree)? It can only serve to make file_write more robust.
|
msg189251 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-14 22:09 |
> Strange. I too modified Serchiy's code and my version of glibc (2.15) set the error flag at the same fwrite call as errno was set:
>
> setvbuf 0 0 0
> fwrite 5 0 0
> fwrite 1 28 1
> fwrite 1 28 1
>
> (the last column being the return value of ferror after each fwrite call)
Hum, you're right, I just re-ran the test, and I'm finding the same
thing as you (I must have been dreaming).
I just re-checked the glibc code, and indeed the write() error is
checked, and the error flag is set:
http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c#l1255
1241 _IO_ssize_t
1242 _IO_new_file_write (f, data, n)
[...]
1251 count = (__builtin_expect (f->_flags2
1252 & _IO_FLAGS2_NOTCANCEL, 0)
1253 ? write_not_cancel (f->_fileno, data, to_do)
1254 : write (f->_fileno, data, to_do));
1255 if (count < 0)
1256 {
1257 f->_flags |= _IO_ERR_SEEN;
1258 break;
1259 }
But then this value is returned, and depending on the position in the
buffer, this -1 ends up being incremented to what's left to write, and
can result in returning exactly the same size that was requested.
That's definitely a bug, and a bad one since you can get silent
corruption (fclose() won't even return an error in most cases).
Anyway, this means that ferror() should be enough - in addition to
checking that the returned value matches.
> I've trusted ferror until now but I'm not an expert on the subject. It's a good point that errno is set by the underlying system call and is thus more reliable. So would checking errno be sufficient (in addition to checking that the lengths agree)? It can only serve to make file_write more robust.
Yeah, would you like to write a patch?
|
msg189253 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-14 22:50 |
> Yeah, would you like to write a patch?
Yes. It's fileobject-fix3.patch attached to this issue record.
|
msg189278 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-15 09:04 |
I tried to reply to the review of my last patch but this tracker software itself crashed. Is there anyone who would be interested in the traceback?
Anyway, I'll reply here.
On 2013/05/15 08:37:29, Charles-François Natali wrote:
> http://bugs.python.org/review/17976/diff/8158/Objects/fileobject.c
> File Objects/fileobject.c (right):
>
> http://bugs.python.org/review/17976/diff/8158/Objects/fileobject.c#newcode1856
> Objects/fileobject.c:1856: if (n2 != n || errno != 0) {
> Hum, we saw that ferror() could be used to detect the error, so it should be
> used instead.
>
> Also, there's a problem with this patch: it's checking errno too late: for
> example, it could have been cleared by FILE_END_ALLOW_THREADS or Py_XDECREF in
> the meantime.
Ok. However I must point out that if errno is cleared in the meantime, that also affects PyErr_SetFromErrno. I've submitted another patch where I'm extra careful with errno.
|
msg189281 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-15 10:45 |
> I tried to reply to the review of my last patch but this tracker
> software itself crashed. Is there anyone who would be interested in
> the traceback?
Never mind. I found the meta tracker and posted my traceback there.
|
msg189484 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-05-17 20:42 |
Attached: Test expressed as an unit test, test_dev_null.py.
The test pass with Python 3 which does not use the "FILE*" API anymore. So you should maybe migrate to Python 3 :-)
$ python3.4 test_dev_null.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.005s
OK
$ python2.7 test_dev_null.py
.F
======================================================================
FAIL: test_write2 (__main__.TestFull)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_dev_null.py", line 13, in test_write2
self.assertRaises(IOError, self.check_write, 'hello', '\n')
AssertionError: IOError not raised
----------------------------------------------------------------------
Ran 2 tests in 0.002s
FAILED (failures=1)
|
msg189645 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-05-20 09:09 |
> The test pass with Python 3 which does not use the "FILE*" API
> anymore. So you should maybe migrate to Python 3 :-)
Yes. I will eventually. But not all the libraries I'm using are migrated yet.
|
msg206422 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 11:53 |
LGTM. But err_flag is not needed, valid error numbers are all nonzero.
|
msg206426 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-12-17 12:40 |
New changeset 33c27b76a4d0 by Serhiy Storchaka in branch '2.7':
Issue #17976: Fixed potential problem with file.write() not detecting IO error
http://hg.python.org/cpython/rev/33c27b76a4d0
|
msg206428 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 12:42 |
Thank you Jaakko Moisio for your report and patch.
|
msg206431 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-12-17 12:54 |
New changeset 237deaf9ba64 by Serhiy Storchaka in branch '2.7':
Skip test for issue #17976 if /dev/null is not available.
http://hg.python.org/cpython/rev/237deaf9ba64
|
msg206437 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-12-17 14:00 |
I played with fullwrite.c and now think that the fix is incomplete. fwrite() may succeed to write data into the buffer, but you may get the error on fflush() or fclose().
Try attached fullwrite2.c: fwrite() succeed (written=len, errno=result=0), whereas fclose() fails. If you enable fflush(), it will also fail.
Output:
---
fwrite(hello
) => 6 bytes written/6 [errno=0, ferror=0]
fclose() => -1 [errno=28]
---
The complete fix is maybe to write fflush() before fclose(), or at least raise an exception if fclose() returns a non-zero result. Correctly, file.close() returns a number in case of an error...
But now comes the question of backward compatibility, may such change "break" applications? It's maybe a nice thing to "break" applications if it warns users that they are going to loose data (USB key full, cannot write the whole important document!).
|
msg206438 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-12-17 14:01 |
(I was trying to report the issue upstream.)
|
msg206441 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 14:07 |
Are you have Python code which exposes a bug?
|
msg206447 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-12-17 14:29 |
> The complete fix is maybe to write fflush() before fclose(), or at least raise an exception if fclose() returns a non-zero result. Correctly, file.close() returns a number in case of an error...
Oh sorry, I missed the line "if (sts == -1) ..." which raises an error.
But I'm still unable to reproduce the glibc bug mentioned by Charles-François :
> Yeah, who's volunteering to report it to the glibc?
|
msg206452 - (view) |
Author: Jaakko Moisio (jasujm) |
Date: 2013-12-17 14:49 |
The new patch is fine as it is, but my logic behind using err_flag was the following: err_flag was set solely based on the inspection of return value of fwrite and ferror, without referencing to errno. It is of course true that at the same time errno is set to non-zero in any correct C standard library implementation (err_flag != 0 iff err != 0), but as the patch was originally for circumventing a bug in glibc, I decided to be use that extra flag for the purpose.
> But I'm still unable to reproduce the glibc bug mentioned by Charles-François :
Yes. It seems that the bug in glibc has been fixed. But at least Python 2.7 is now a little bit better guarded against exotic file IO bugs that might emerge in C standard libraries :)
|
msg206461 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-12-17 15:32 |
New changeset 24a043355050 by Serhiy Storchaka in branch '2.7':
Circumventing a bug in glibc (issue #17976).
http://hg.python.org/cpython/rev/24a043355050
|
msg206463 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-17 15:33 |
Agree.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:45 | admin | set | github: 62176 |
2013-12-17 15:33:22 | serhiy.storchaka | set | messages:
+ msg206463 |
2013-12-17 15:32:48 | python-dev | set | messages:
+ msg206461 |
2013-12-17 14:49:46 | jasujm | set | messages:
+ msg206452 |
2013-12-17 14:29:42 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg206447
|
2013-12-17 14:07:24 | serhiy.storchaka | set | messages:
+ msg206441 |
2013-12-17 14:01:05 | vstinner | set | messages:
+ msg206438 |
2013-12-17 14:00:42 | vstinner | set | status: closed -> open files:
+ fullwrite2.c resolution: fixed -> (no value) messages:
+ msg206437
|
2013-12-17 12:54:17 | python-dev | set | messages:
+ msg206431 |
2013-12-17 12:42:19 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg206428
stage: commit review -> resolved |
2013-12-17 12:40:53 | python-dev | set | nosy:
+ python-dev messages:
+ msg206426
|
2013-12-17 11:53:52 | serhiy.storchaka | set | assignee: serhiy.storchaka messages:
+ msg206422 stage: needs patch -> commit review |
2013-05-20 09:09:02 | jasujm | set | files:
+ fileobject-fix5.patch
messages:
+ msg189645 |
2013-05-17 20:42:59 | vstinner | set | files:
+ test_dev_null.py nosy:
+ vstinner messages:
+ msg189484
|
2013-05-15 10:45:01 | jasujm | set | messages:
+ msg189281 |
2013-05-15 09:04:58 | jasujm | set | messages:
+ msg189278 |
2013-05-15 08:53:30 | jasujm | set | files:
+ fileobject-fix4.patch |
2013-05-14 22:50:34 | jasujm | set | files:
+ fileobject-fix3.patch
messages:
+ msg189253 |
2013-05-14 22:09:27 | neologix | set | messages:
+ msg189251 |
2013-05-14 19:04:38 | jasujm | set | files:
+ fileobject-fix2.patch
messages:
+ msg189246 |
2013-05-14 17:59:56 | neologix | set | messages:
+ msg189241 |
2013-05-14 17:11:20 | pitrou | set | priority: normal -> low
messages:
+ msg189240 stage: needs patch |
2013-05-14 16:52:35 | neologix | set | nosy:
+ pitrou messages:
+ msg189237
|
2013-05-14 16:39:02 | serhiy.storchaka | set | files:
+ fullwrite.c nosy:
+ serhiy.storchaka messages:
+ msg189235
|
2013-05-14 16:26:07 | neologix | set | nosy:
+ neologix messages:
+ msg189233
|
2013-05-14 15:10:27 | jasujm | create | |