This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: File IO r+ read, write, read causes garbage data write.
Type: behavior Stage: resolved
Components: Interpreter Core, IO, Windows Versions: Python 2.7
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, jan, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2017-03-15 09:47 by jan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg289657 - (view) Author: Jan Pijpers (jan) Date: 2017-03-15 09:47
In Python 2.7.12 when reading, writing and subsequently reading again from a file, python seems to write garbage.

For example when running this in python IDLE: 

import os 
testPath = r"myTestFile.txt"

## Make sure the file exists and its empty
with open(testPath,"w") as tFile:
    tFile.write("")

print "Our Test File: ", os.path.abspath(testPath )

with open(testPath, "r+") as tFile:
    ## First we read the file 
    data = tFile.read()

    ## Now we write some data 
    tFile.write('Some Data')

    ## Now we read the file again
    tFile.read()


When now looking at the file the data is the following:

Some Data @ sb d Z d d l m Z d d d ・ ・ YZ e d k r^ d d l m Z e d d d d e ・n d S( s9
Implement Idle Shell history mechanism with History
...<omitted the rest of the data> 

As mentioned in the comments on stack overflow ( see link ) this might be a buffer overrun but I am not sure. Also I guess this could be used as a security vulnerability... 

http://stackoverflow.com/questions/40373457/python-r-read-write-read-writes-garbage-to-a-file?noredirect=1#comment72580538_40373457
msg289675 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-15 13:39
This is a bug in the C runtime's handling of "r+" mode with buffering. 

The CRT FILE stream's internal _cnt field, from the POV of the write() call, is the number of bytes that can be written to the internal buffer before it's full. The default buffer size is 4096 bytes. Thus after writing "Some Data", _cnt is at 4096 - 9 == 4087 bytes. 

On the other hand, from the POV of the subsequent read() call, this means there are 4087 bytes in the buffer available to be read. If you change your code to keep the result, you'll see that it is indeed 4087 bytes. 

After the read, _cnt is at 0 and the stream's internal _ptr and _base pointers indicate a full buffer, which gets flushed to disk when the file is closed. If you change your code to print os.path.getsize(testPath) after the file is closed, then you should see that the size is 4096 bytes -- exactly one buffer. 

If you open the file with buffering=512, then this changes to 503 bytes read and creates a 512 byte file.

Can and should Python do anything to work around this problem in the CRT? Or should this issue simply be closed as 3rd party? I'm inclined to close it.
msg289676 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-03-15 14:32
Also, this is a Python 2 only issue. The problem doesn't happen in Python 3.6 (at least in my quick experiment). I'm not 100% sure if this is because the internal implementation of IO changed in 3.x, or if it's just because we're now using a newer CRT which has fixed the issue.

I agree that there's no point in Python trying to work around this behaviour.
History
Date User Action Args
2022-04-11 14:58:44adminsetgithub: 74003
2017-03-15 14:32:50paul.mooresetstatus: open -> closed
stage: resolved
2017-03-15 14:32:25paul.mooresetresolution: third party
messages: + msg289676
2017-03-15 13:39:51eryksunsetnosy: + eryksun
messages: + msg289675
2017-03-15 09:48:09jansettitle: File IO read, write, read causes garbage data write. -> File IO r+ read, write, read causes garbage data write.
2017-03-15 09:47:30jancreate