This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: seek doesn't properly handle file buffer, leads to silent data corruption
Type: behavior Stage: needs patch
Components: IO Versions: Python 3.1, Python 3.2, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: lorentey, pitrou
Priority: critical Keywords: patch

Created on 2009-08-03 02:00 by lorentey, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
issue6629.patch pitrou, 2009-08-05 12:05
Messages (5)
msg91211 - (view) Author: Karoly Lorentey (lorentey) Date: 2009-08-03 02:00
The new io.BufferedRandom implementation in Python 3.1 has a broken seek 
that seems not to properly handle the case when the target of the seek 
lies inside the contents of the file buffer.  It leaves the file object 
in a confused state, such that the next write operation applies after 
the end of the buffer(!) instead of the specified target.

I could reproduce the following symptoms on both Debian Lenny and Mac OS 
X Leopard.  I downloaded the Python 3.1 tarball from, and 
built it by hand using './configure && make'.

$ ./python.exe
Python 3.1 (r31:73572, Aug  3 2009, 02:32:10) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open("test", "wb").write(b"A" * 10000)
>>> file = open("test", "rb+")
>>>       # Reads 4096 bytes into file buffer
>>> file.tell()
>>> file.tell()
>>> file.write(b"B" * 10000)  # This should overwrite the whole file
>>> file.tell()
14096  # Hmm, 0 + 10000 == 14096?
>>> file.close()
>>> d = open("test", "rb").read()
>>> len(d)
14096  # ?!
>>> d[0:10]      # The file should now consist of 10000 Bs...
>>> d[4090:4100]
b'AAAAAABBBB'    # ... but the Bs only start after a buffer's worth of 

This bug has actually caused me some subtle, silent data corruption that 
went undetected for quite a while.  Hurray for backups!

The above code works fine in Python 3.0, and its Python 2.5 port also 
produces correct results.

A workaround for 3.1 is to call flush before every seek.
msg91246 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-03 22:51
I'll look into this as soon as possible.
msg91316 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-05 12:05
Here is an initial patch. Could you try it on your workload?
msg91320 - (view) Author: Karoly Lorentey (lorentey) Date: 2009-08-05 15:44
The patch does fix my issue, thank you.
msg91388 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-08-06 20:54
I've committed the patch + tests in r74336 (trunk), r74338 (py3k) and
r74339 (3.1).
Thanks for the report, and don't hesitate to do more stress testing of
the IO lib!
Date User Action Args
2022-04-11 14:56:51adminsetgithub: 50878
2009-08-06 20:54:07pitrousetstatus: open -> closed
resolution: fixed
messages: + msg91388

versions: + Python 2.7
2009-08-05 15:44:02lorenteysetmessages: + msg91320
2009-08-05 12:05:01pitrousetfiles: + issue6629.patch
keywords: + patch
messages: + msg91316
2009-08-03 22:51:27pitrousetpriority: critical

assignee: pitrou
versions: + Python 3.2
nosy: + pitrou

messages: + msg91246
stage: needs patch
2009-08-03 02:00:39lorenteycreate