Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in file.read(), can access unknown data. #57380

Closed
AlexanderSteppke mannequin opened this issue Oct 13, 2011 · 8 comments
Closed

Bug in file.read(), can access unknown data. #57380

AlexanderSteppke mannequin opened this issue Oct 13, 2011 · 8 comments
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@AlexanderSteppke
Copy link
Mannequin

AlexanderSteppke mannequin commented Oct 13, 2011

BPO 13171
Nosy @terryjreedy, @vstinner, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-05-31.12:07:21.121>
created_at = <Date 2011-10-13.18:40:10.345>
labels = ['type-bug', 'docs']
title = 'Bug in file.read(), can access unknown data.'
updated_at = <Date 2020-05-31.12:07:21.120>
user = 'https://bugs.python.org/AlexanderSteppke'

bugs.python.org fields:

activity = <Date 2020-05-31.12:07:21.120>
actor = 'serhiy.storchaka'
assignee = 'docs@python'
closed = True
closed_date = <Date 2020-05-31.12:07:21.121>
closer = 'serhiy.storchaka'
components = ['Documentation']
creation = <Date 2011-10-13.18:40:10.345>
creator = 'Alexander.Steppke'
dependencies = []
files = []
hgrepos = []
issue_num = 13171
keywords = []
message_count = 8.0
messages = ['145477', '145480', '145501', '145502', '145508', '145513', '145541', '145577']
nosy_count = 5.0
nosy_names = ['terry.reedy', 'vstinner', 'r.david.murray', 'docs@python', 'Alexander.Steppke']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue13171'
versions = ['Python 2.7']

@AlexanderSteppke
Copy link
Mannequin Author

AlexanderSteppke mannequin commented Oct 13, 2011

The tempfile module shows strange behavior under certain conditions. This might lead to data leaking or other problems.

The test session looks as follows:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tmp = tempfile.TemporaryFile()
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'P\xf6D\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ [ommitted]'

or similar behavior in text mode:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tmp = tempfile.TemporaryFile('w+t')
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'\x00\xa5\x8b\x02int or long, hash(a) is used instead.\n    i\x10 [ommitted]'
>>> tmp.seek(0)
>>> tmp.readline()
'test\x00\xa5\x8b\x02int or long, hash(a) is used instead.\n'

This bug seems to be triggered by calling tmp.read() before tmp.seek(). I am running Python 2.7.2 on Windows 7 x64, other people have reproduced the problem on Windows XP but not under Linux or Cygwin (see also http://stackoverflow.com/questions/7757663/python-tempfile-broken-or-am-i-doing-it-wrong).

Thank you for looking into this.
Alexander

@AlexanderSteppke AlexanderSteppke mannequin added stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Oct 13, 2011
@bitdancer
Copy link
Member

I wonder if it is a bug in Windows? Have you tried similar experiments with regular files? tempfile is really just about *where* the files are located (and what happens when they are closed), not about their fundamental nature as OS file objects. (I could be wrong about that on Windows of course, I'm more familiar with Linux.)

@AlexanderSteppke
Copy link
Mannequin Author

AlexanderSteppke mannequin commented Oct 14, 2011

Hi David,

I followed your suggestion and tried to reproduce the problem without the tempfile module. It turns out that is indeed an underlying issue. I am not sure what the root cause is but now this is even a bigger problem: read() returns information from some file/memory that it was never intended to access.

The session looks similar to the tempfile session:

>>> tmp = open('tmp', 'w+t')
>>> tmp.read()
''
>>> tmp.write('test')
>>> tmp.read()
'hp\'\x02\xe4\xb9>7\x80\x88\x81\x02\x01\x00\x00\x00\x00\x00\x00\x00\x12\x00\x00\
x00\xe86(\x02p\x11\x8d\x02\x01\x00\x00\x00@\xfd)\x02\xe7Y\x9aN\x01\x00\x00\x00\x
00\x00\x00\x00\x14\x00\x00\x00\x087(\x02\x00\x00\x00\x00\xe9Y\x0b\xa2\x00\x93+\x
02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x9b,\x02\x02\x00\x00\x00\xe06(\x02\xc0W5\

At the moment the bug could only be reproduced using CPython 2.7.1 on Windows XP and Windows 7.

Alexander

@AlexanderSteppke AlexanderSteppke mannequin added the topic-IO label Oct 14, 2011
@AlexanderSteppke AlexanderSteppke mannequin changed the title Bug in tempfile module Bug in file.read(), can access unknown data. Oct 14, 2011
@AlexanderSteppke
Copy link
Mannequin Author

AlexanderSteppke mannequin commented Oct 14, 2011

Additionally after calling tmp.close() the file 'tmp' contains the string 'test', which is followed by about 4kB of binary data similar to the previous output of tmp.read().

@vstinner
Copy link
Member

This issue is a duplicate of the issue bpo-1394612 which has been closed as invalid. Read the following message:

http://bugs.python.org/issue1394612#msg27200

I suppose that Python 3 is not affected by this issue because it doesn't use fread/fwrite anymore, but directly read/write (the low level, unbuffered, API).

It looks like Python cannot do anything for this issue, except documenting this surprising behaviour. Would you like to write a patch for the documentation?

@AlexanderSteppke
Copy link
Mannequin Author

AlexanderSteppke mannequin commented Oct 14, 2011

Thank you for the update Victor. It seems to me that this is exactly the same issue.

At the moment the current documentation says (http://docs.python.org/library/stdtypes.html#bltin-file-objects):

"Note: This function is simply a wrapper for the underlying fread() C function, and will behave the same in corner cases, such as whether the EOF value is cached."

This is a hint to the current behavior but I would not expect from this that file.read() can return any kind of data, if used directly after file.write(). Maybe one could include a link or a snippet of the C standard which states that one shall not do this:

"When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an
intervening call to a file positioning function, unless the input operation encounters end-of-file."

(from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, page 272)

@vstinner
Copy link
Member

Le 14/10/2011 14:37, Alexander Steppke a écrit :

"When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values),

You can just say " '+' in the file mode ".

the fflush function or to a file positioning function (fseek, fsetpos, or rewind),

You should translate these names into Python method names:
fflush -> file.flush()
fseek/fsetpos -> file.seek()
rewind -> (not exposed in Python)

@terryjreedy
Copy link
Member

This issue has come up enough (tracker and python-list) that I think adding a mild adaptation of the C standard paragraph might be a good idea. Changing to a doc issue.

@terryjreedy terryjreedy added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir OS-windows topic-IO labels Oct 15, 2011
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants