classification
Title: seek doesn't properly handle file buffer, leads to silent data corruption
Type: behavior Stage: needs patch
Components: IO Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: lorentey, pitrou
Priority: critical Keywords: patch

Created on 2009-08-03 02:00 by lorentey, last changed 2009-08-06 20:54 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
issue6629.patch pitrou, 2009-08-05 12:05
Messages (5)
msg91211 - (view) Author: Karoly Lorentey (lorentey) Date: 2009-08-03 02:00
The new io.BufferedRandom implementation in Python 3.1 has a broken seek 
that seems not to properly handle the case when the target of the seek 
lies inside the contents of the file buffer.  It leaves the file object 
in a confused state, such that the next write operation applies after 
the end of the buffer(!) instead of the specified target.

I could reproduce the following symptoms on both Debian Lenny and Mac OS 
X Leopard.  I downloaded the Python 3.1 tarball from python.org, and 
built it by hand using './configure && make'.

$ ./python.exe
Python 3.1 (r31:73572, Aug  3 2009, 02:32:10) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open("test", "wb").write(b"A" * 10000)
10000
>>> file = open("test", "rb+")
>>> file.read(10)       # Reads 4096 bytes into file buffer
b'AAAAAAAAAA'
>>> file.tell()
10
>>> file.seek(0)
0
>>> file.tell()
0
>>> file.write(b"B" * 10000)  # This should overwrite the whole file
10000
>>> file.tell()
14096  # Hmm, 0 + 10000 == 14096?
>>> file.close()
>>> d = open("test", "rb").read()
>>> len(d)
14096  # ?!
>>> d[0:10]      # The file should now consist of 10000 Bs...
b'AAAAAAAAAA'
>>> d[4090:4100]
b'AAAAAABBBB'    # ... but the Bs only start after a buffer's worth of 
As.

This bug has actually caused me some subtle, silent data corruption that 
went undetected for quite a while.  Hurray for backups!

The above code works fine in Python 3.0, and its Python 2.5 port also 
produces correct results.

A workaround for 3.1 is to call flush before every seek.
msg91246 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-03 22:51
I'll look into this as soon as possible.
msg91316 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-05 12:05
Here is an initial patch. Could you try it on your workload?
msg91320 - (view) Author: Karoly Lorentey (lorentey) Date: 2009-08-05 15:44
The patch does fix my issue, thank you.
msg91388 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-06 20:54
I've committed the patch + tests in r74336 (trunk), r74338 (py3k) and
r74339 (3.1).
Thanks for the report, and don't hesitate to do more stress testing of
the IO lib!
History
Date User Action Args
2009-08-06 20:54:07pitrousetstatus: open -> closed
resolution: fixed
messages: + msg91388

versions: + Python 2.7
2009-08-05 15:44:02lorenteysetmessages: + msg91320
2009-08-05 12:05:01pitrousetfiles: + issue6629.patch
keywords: + patch
messages: + msg91316
2009-08-03 22:51:27pitrousetpriority: critical

assignee: pitrou
versions: + Python 3.2
nosy: + pitrou

messages: + msg91246
stage: needs patch
2009-08-03 02:00:39lorenteycreate