classification
Title: Serious interpreter crash and/or arbitrary memory leak using .read() on writable file
Type: crash Stage: resolved
Components: Interpreter Core, IO Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, loewis, nneonneo, pitrou, pmpp, skrah
Priority: high Keywords: patch

Created on 2009-04-03 01:24 by nneonneo, last changed 2010-02-05 17:13 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
fileread-errno.patch skrah, 2010-01-31 15:22 Handle EBADF on Windows.
errno-ebadf.patch skrah, 2010-02-02 14:41
errno-ebadf-tests.patch skrah, 2010-02-02 14:41
add-readable-writable.patch skrah, 2010-02-03 13:52 Add readable/writable to file object.
add-readable-writable2.patch skrah, 2010-02-05 13:31 Add readable/writable to file object #2.
add-readable-writable3.patch skrah, 2010-02-05 16:38 Add readable/writable to file object #3.
Messages (18)
msg85286 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-04-03 01:24
(tested and verified on Windows and Solaris SPARC)

Running this code in Python 2.4, 2.5 or 2.6 (all minor versions)
produces garbage.

f=open("anyfile","w")
f.write("garbage")
f.readline()

Mac OS X and Linux appear to simply throw an "IOError: [Errno 9] Bad
file descriptor" exception, while using a read method without writing
first produces the same exception on Windows and certain versions under
Solaris.

Under Solaris, it is further possible to segfault the interpreter with
this code:
f=open("anyfile","w")
f.read()

In the former case, it appears as if the data is simply read from the
disk block containing the file. In the latter, I have no clue what is
going on.

In Python 3.0, file objects opened with "w" don't even support any .read
methods, so this does not affect Py3k.
msg85287 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-04-03 01:34
OK, it's not a memory leak, rather, it's something leaking from the
files. However, this still winds up filling the affected file full of
garbage, which is quite undesirable (and, I think, quite likely to be
unwanted behaviour, considering that *some* read methods throw IOError 9
and some others don't).
msg98577 - (view) Author: pmpp (pmpp) Date: 2010-01-30 22:24
related:
	f=open('test','w+b')
	f.write('123456789ABCDEF')
	#f.seek(0)
	print "position",f.tell()
	print '[',len(f.read()),']'
	f.close()

windows: 2.6.2 mingw/ 2.6.4 msvc builds
len(f.read()) > 0   ( 4081 when i did the test ) but should be 0
printing the buffer show binary garbage.

linux is ok f.read() return '' as expected
msg98582 - (view) Author: Robert Xiao (nneonneo) * Date: 2010-01-30 23:40
It seems like this is actually a problem in Windows libc or something (tested using MinGW on Windows XP):

#include <stdio.h>

main() {
    FILE *f = fopen("test", "wb");
    fwrite("test", 1, 4, f);
    char buf[2048];
    size_t k = fread(buf, 1, 2048, f);
    printf("%d\n", k);
    int i=0;
    for(; i<k; i++) printf("%02x", buf[i]);
}

This causes a lot of garbage to be printed out. Removing the fwrite causes "0" to be printed with no further output.

The garbage is not from the uninitialized buffer, since I've verified that the original contents of the buffer are not what is being printed out. Furthermore, adjusting 2048 produces a maximum output of 4092 bytes (even with 9999 in place of 2048).

Short of simply denying read() on writable files, I don't really see an easy way around this libc bug.
msg98613 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-31 13:17
I wonder if this is technically a bug. The stream is not opened for reading and yet you do an fread. I quickly glanced through the
C-Standard and I did not find an _explicit_ paragraph that this is
undefined behavior, but personally I'd expect it to be.

Regarding the last C example. Also here, you do not open using "w+".
Now, even if you used "w+", by the standard you'd have to do an fflush,
fseek, fsetpos or rewind before reading. I don't see a libc bug.
msg98617 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-31 15:22
Bug or not, this can be handled since fread sets EBADF on Windows.
msg98619 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-31 16:18
The patch should come with a test.
msg98663 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-01 14:15
I see that a complete errno based solution would get messy. To avoid
interfering with EAGAIN special cases, this would be needed:

#if defined(EBADF)
#define ERRNO_EBADF(x) ((x) == EBADF)
#else
#define ERRNO_EBADF(x) 0
#endif

Then, additional checks would need to be added to get_line,
getline_via_fgets and a couple of other places.


I think the right thing is to add a field f_modeflags to the file
object, to be initialized on creation. Or use f->readable, f->writable
like py3k. Thoughts?
msg98664 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-01 14:37
I think checking errno would be fine (provided it eliminates all variations ot this bug).
msg98740 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-02 14:41
An errno+ferror approach works except for two cases:

  (1) MS fgets() does not consider reading from a write-only stream
      an error. While annoying, I think this is standard conforming.

  (2) OpenSolaris sets errno unreliably after getc.


I do not know how to get around those without using OS specifics.
msg98741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-02 14:51
> An errno+ferror approach works except for two cases:
> 
>   (1) MS fgets() does not consider reading from a write-only stream
>       an error. While annoying, I think this is standard conforming.
> 
>   (2) OpenSolaris sets errno unreliably after getc.
> 
> 
> I do not know how to get around those without using OS specifics.

Ok, then perhaps it's better to have some internal {readable, writable}
flags based on the original mode string.
msg98782 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-03 13:52
The new patch takes over the logic from fileio.c. Tested on Linux, Windows x86/x64 and OpenSolaris.
msg98831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-04 12:41
Hmm, test_file.py is for the new IO library (backported from 3.x). You should use test_file2k.py for tests against the 2.x file object.

Also, readinto() should take something such as a bytearray() as argument, not a string.
msg98870 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-05 13:31
Thanks Antoine, I fixed both issues.
msg98871 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-05 13:55
Oh and the following line shouldn't be needed:

data = b'xxx' if 'b' in mode else u'xxx'

Since old-style file objects always take bytestrings (it's harmless, though, because the unicode string will be implicitly converted).
msg98874 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-02-05 14:35
Ok, I'll fix that. Perhaps I should also remove the comment in err_mode(). I wonder if the ValueError in fileio.c is intentional:

>>> import io
>>> f = io.open("testfile", "w")
>>> f.write("xxx")
3
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: not readable
>>> 
>>> f = io.FileIO("testfile", "w")
>>> f.write("xxx")
3
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: File not open for reading
msg98875 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-05 14:48
> I wonder if the ValueError in fileio.c is intentional:

I don't know, but this would be the subject of a separate issue anyway.
msg98885 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-05 17:13
The patch was committed in r77989 (trunk) and r77990 (2.6). Thank you Stefan!
History
Date User Action Args
2010-02-05 17:13:40pitrousetstatus: open -> closed
resolution: fixed
messages: + msg98885

stage: test needed -> resolved
2010-02-05 16:38:23skrahsetfiles: + add-readable-writable3.patch
2010-02-05 14:48:00pitrousetmessages: + msg98875
2010-02-05 14:35:22skrahsetmessages: + msg98874
2010-02-05 13:55:41pitrousetmessages: + msg98871
2010-02-05 13:31:11skrahsetfiles: + add-readable-writable2.patch

messages: + msg98870
2010-02-04 12:41:23pitrousetmessages: + msg98831
versions: + Python 2.7
2010-02-04 12:29:41pitrousetnosy: + loewis
2010-02-03 13:52:16skrahsetfiles: + add-readable-writable.patch

messages: + msg98782
2010-02-02 14:51:49pitrousetmessages: + msg98741
2010-02-02 14:41:41skrahsetfiles: + errno-ebadf-tests.patch
2010-02-02 14:41:17skrahsetfiles: + errno-ebadf.patch

messages: + msg98740
2010-02-01 14:37:12pitrousetmessages: + msg98664
2010-02-01 14:15:57skrahsetmessages: + msg98663
2010-01-31 16:18:50pitrousetmessages: + msg98619
2010-01-31 15:22:30skrahsetfiles: + fileread-errno.patch
keywords: + patch
messages: + msg98617
2010-01-31 13:17:38skrahsetnosy: + skrah
messages: + msg98613
2010-01-30 23:40:20nneonneosetmessages: + msg98582
2010-01-30 22:24:46pmppsetnosy: + pmpp

messages: + msg98577
versions: - Python 2.5
2009-05-16 01:13:42ajaksu2setversions: - Python 2.4
nosy: + pitrou, benjamin.peterson

priority: high
components: + IO
stage: test needed
2009-04-03 01:34:48nneonneosetmessages: + msg85287
2009-04-03 01:24:42nneonneocreate