New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in file.read(), can access unknown data. #57380
Comments
The tempfile module shows strange behavior under certain conditions. This might lead to data leaking or other problems. The test session looks as follows: Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tmp = tempfile.TemporaryFile()
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'P\xf6D\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ [ommitted]' or similar behavior in text mode: Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tmp = tempfile.TemporaryFile('w+t')
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'\x00\xa5\x8b\x02int or long, hash(a) is used instead.\n i\x10 [ommitted]'
>>> tmp.seek(0)
>>> tmp.readline()
'test\x00\xa5\x8b\x02int or long, hash(a) is used instead.\n' This bug seems to be triggered by calling tmp.read() before tmp.seek(). I am running Python 2.7.2 on Windows 7 x64, other people have reproduced the problem on Windows XP but not under Linux or Cygwin (see also http://stackoverflow.com/questions/7757663/python-tempfile-broken-or-am-i-doing-it-wrong). Thank you for looking into this. |
I wonder if it is a bug in Windows? Have you tried similar experiments with regular files? tempfile is really just about *where* the files are located (and what happens when they are closed), not about their fundamental nature as OS file objects. (I could be wrong about that on Windows of course, I'm more familiar with Linux.) |
Hi David, I followed your suggestion and tried to reproduce the problem without the tempfile module. It turns out that is indeed an underlying issue. I am not sure what the root cause is but now this is even a bigger problem: read() returns information from some file/memory that it was never intended to access. The session looks similar to the tempfile session: >>> tmp = open('tmp', 'w+t')
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'hp\'\x02\xe4\xb9>7\x80\x88\x81\x02\x01\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\
x00\xe86(\x02p\x11\x8d\x02\x01\x00\x00\x00@\xfd)\x02\xe7Y\x9aN\x01\x00\x00\x00\x
00\x00\x00\x00\x14\x00\x00\x00\x087(\x02\x00\x00\x00\x00\xe9Y\x0b\xa2\x00\x93+\x
02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x9b,\x02\x02\x00\x00\x00\xe06(\x02\xc0W5\ At the moment the bug could only be reproduced using CPython 2.7.1 on Windows XP and Windows 7. Alexander |
Additionally after calling tmp.close() the file 'tmp' contains the string 'test', which is followed by about 4kB of binary data similar to the previous output of tmp.read(). |
This issue is a duplicate of the issue bpo-1394612 which has been closed as invalid. Read the following message: http://bugs.python.org/issue1394612#msg27200 I suppose that Python 3 is not affected by this issue because it doesn't use fread/fwrite anymore, but directly read/write (the low level, unbuffered, API). It looks like Python cannot do anything for this issue, except documenting this surprising behaviour. Would you like to write a patch for the documentation? |
Thank you for the update Victor. It seems to me that this is exactly the same issue. At the moment the current documentation says (http://docs.python.org/library/stdtypes.html#bltin-file-objects): "Note: This function is simply a wrapper for the underlying fread() C function, and will behave the same in corner cases, such as whether the EOF value is cached." This is a hint to the current behavior but I would not expect from this that file.read() can return any kind of data, if used directly after file.write(). Maybe one could include a link or a snippet of the C standard which states that one shall not do this: "When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an (from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, page 272) |
Le 14/10/2011 14:37, Alexander Steppke a écrit :
You can just say " '+' in the file mode ".
You should translate these names into Python method names: |
This issue has come up enough (tracker and python-list) that I think adding a mild adaptation of the C standard paragraph might be a good idea. Changing to a doc issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: