classification
Title: Problems with files opened in append mode with io module
Type: Stage:
Components: IO Versions: Python 3.4, Python 3.3, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: erik.bray, haypo, neologix, pitrou, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2013-08-29 17:07 by erik.bray, last changed 2013-09-06 11:58 by pitrou.

Files
File name Uploaded Description Edit
issue_18876_patch_1.diff erik.bray, 2013-08-29 22:11 review
issue_18876_patch_2.diff erik.bray, 2013-09-04 16:23 An updated patch incorporating suggestions from review review
Messages (13)
msg196464 - (view) Author: Erik Bray (erik.bray) * Date: 2013-08-29 17:07
I've come across a few difficulties of late with the io module's handling of files opened in append mode (any variation on 'a', 'ab', 'a+', 'ab+', etc.

The biggest problem is that the io module does not in any way keep track of whether a file was opened in append mode, and it's essentially impossible to determine the original mode string that was provided by the user.  For example:

>>> f = open('test', mode='ab+', buffering=0)
>>> f
<_io.FileIO name='test' mode='rb+'>

The 'a' is gone.  That doesn't mean the file *isn't* in append mode.  If supported, in fileio_init this still causes the O_APPEND flag to be added to the open() call.  But the *only* way to find out after the fact that the file  was actually opened in append mode is with fcntl:

>>> fcntl.fcntl(f.fileno(), fnctl.F_GETFL) & os.O_APPEND
1024

but this is hardly easily accessible or portable.  So it's possible to have two files open in 'rb+' mode but that have wildly differing behaviors.

The only other thing fileio_init does differently with append mode is that it seeks to the end of the file by default.  But that does not make the append behavior "portable".  If, on a system where O_APPEND was not supported, I seek to a different part of the file and the call write() it will *not* append to the end of the file.  Whereas the behavior of O_APPEND causes an automatic seek to the end before any write().

The fact that no record of the request for 'append' mode is kept leads to further bugs, particularly in BufferedWriter.  It doesn't know the raw file was opened with O_APPEND so the writes it shows in the buffer differ from what will actually end up in the file.  For example:

>>> f = open('test', 'wb')
>>> f.write(b'testest')
7
>>> f.close()
>>> f = open('test', 'ab+')
>>> f.tell()
7
>>> f.write(b'A')
1
>>> f.seek(0)
0
>>> f.read()
b'testestA'
>>> f.seek(0)
0
>>> f.read(1)
b't'
>>> f.write(b'B')
1
>>> f.seek(0)
0
>>> f.read()
b'tBstestA'
>>> f.flush()
>>> f.seek(0)
0
>>> f.read()
b'testestAB'

In this example, I read 1 byte from the beginning of the file, then write one byte.  Because of O_APPEND, the effect of the write() call on the raw file is to append, regardless of where BufferedWriter seeks it to first.  But before the f.flush() call f.read() just shows what's in the buffer which is not what will actually be written to the file.  (Naturally, unbuffered io does not have this particular problem.)

So, I'm thinking maybe the fileio struct needs to grow an 'append' member.  This could be used to provide a more accurate mode string, and could for example in fileio_write to provide append-like support where it isn't natively supported (though perhaps without any guarantees as to atomicity).
msg196466 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-29 17:18
Hi, Erik.  I changed the versions because we use versions in this tracker to indicate which versions we intend to *fix* the problem in.  I left 2.7 and 3.3 marked for the moment, but I have a feeling that this will need to be a feature-release-only change.  Not only does it require a new attribute, it also changes the semantics (albeit in a sensible way), and as such could break working code if applied to a maintenance release.
msg196468 - (view) Author: Erik Bray (erik.bray) * Date: 2013-08-29 17:29
Ah right, sorry about that. I just came over from the Trac site for one of my projects where the version field is used for affected versions :)
msg196469 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-29 17:49
On which systems is O_APPEND not supported? It's part of the POSIX standard, and even Windows seems to have it.

> Whereas the behavior of O_APPEND causes an automatic seek to the end
> before any write().

True, but IIRC some systems seek on open() and some systems seek just before write().

Agreed about the "mode" attribute problem, though. And the BufferedWriter bug is truly embarassing, sorry :-(
msg196478 - (view) Author: Charles-Fran├žois Natali (neologix) * (Python committer) Date: 2013-08-29 18:46
> On which systems is O_APPEND not supported? It's part of the POSIX standard, and even Windows seems to have it.

That would be surprising.
An easy way to find this out would be to remove the ifdef:
352 #ifdef O_APPEND
353 if (append)
354 flags |= O_APPEND;
355 #endif

>> Whereas the behavior of O_APPEND causes an automatic seek to the end
>> before any write().
>
> True, but IIRC some systems seek on open() and some systems seek just before write().

POSIX makes it clear that only the later behavior is legal:
O_APPENDIf set, the file offset shall be set to the end of the file
prior to each write.
msg196490 - (view) Author: Erik Bray (erik.bray) * Date: 2013-08-29 20:44
>> Whereas the behavior of O_APPEND causes an automatic seek to the end
>> before any write().

> True, but IIRC some systems seek on open() and some systems seek just before write().

I figured that workaround that seeks to the end on open was an attempt to normalize behavior between different systems, and seems to me like a reasonable compromise to support systems that somehow don't support append.

But a better compromise I think would be to seek to the beginning of the file and modify fileio_write and friends to always seek to the end before a write if the file was opened in append mode. This would be more in line with the POSIX behavior.

I have no idea on what systems O_APPEND isn't supported, but I figured they must exist because someone put an #ifdef in there once :)
msg196495 - (view) Author: Erik Bray (erik.bray) * Date: 2013-08-29 22:11
Here's an initial stab at a simple patch that just addresses the issue of 'append' not being in the mode string.  Amazingly this did not break a single existing test, though I added a new one to test how FileIO objects display their mode string after being initialized (normalizes modes like 'r+b' to 'rb+', etc.)  Most of what it tests is the existing behavior, but it also tests for the reintroduction of the 'a' mode.

This doesn't address the BufferedWriter issue, nor does it address cross-platform issues.
msg196666 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-31 19:05
Thanks for the patch. It looks mostly ok.
Could you sign a contributor agreement so that we can move forward?
http://www.python.org/psf/contrib/
msg196943 - (view) Author: Roundup Robot (python-dev) Date: 2013-09-04 18:58
New changeset 33bd39b67cc1 by Antoine Pitrou in branch '3.3':
Issue #18876: The FileIO.mode attribute now better reflects the actual mode under which the file was opened.
http://hg.python.org/cpython/rev/33bd39b67cc1

New changeset b5530669ef70 by Antoine Pitrou in branch 'default':
Issue #18876: The FileIO.mode attribute now better reflects the actual mode under which the file was opened.
http://hg.python.org/cpython/rev/b5530669ef70
msg196964 - (view) Author: Roundup Robot (python-dev) Date: 2013-09-04 22:03
New changeset fc2b88a27fa1 by Antoine Pitrou in branch '2.7':
Issue #18876: The FileIO.mode attribute now better reflects the actual mode under which the file was opened.
http://hg.python.org/cpython/rev/fc2b88a27fa1
msg196965 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-04 22:04
I have committed your patch. Thank you!
msg197032 - (view) Author: Erik Bray (erik.bray) * Date: 2013-09-05 22:04
Thank you!  Has there been a separate issue opened for the BufferedWriter bug or can that be covered by this issue as well?
msg197063 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-06 11:58
> Erik Bray added the comment:
> 
> Thank you!  Has there been a separate issue opened for the
> BufferedWriter bug or can that be covered by this issue as well?

No other issue has been opened as I know of, but indeed it would be
clearer to open a separate one.
History
Date User Action Args
2013-09-06 11:58:52pitrousetmessages: + msg197063
2013-09-05 22:04:11erik.braysetmessages: + msg197032
2013-09-04 22:04:59pitrousetmessages: + msg196965
2013-09-04 22:03:49python-devsetmessages: + msg196964
2013-09-04 18:58:06python-devsetnosy: + python-dev
messages: + msg196943
2013-09-04 16:23:02erik.braysetfiles: + issue_18876_patch_2.diff
2013-08-31 19:05:33pitrousetmessages: + msg196666
2013-08-29 22:11:02erik.braysetfiles: + issue_18876_patch_1.diff
keywords: + patch
messages: + msg196495
2013-08-29 20:47:01hayposetnosy: + haypo
2013-08-29 20:44:36erik.braysetmessages: + msg196490
2013-08-29 18:46:37neologixsetmessages: + msg196478
2013-08-29 17:49:46pitrousetnosy: + neologix
messages: + msg196469
2013-08-29 17:29:53erik.braysetmessages: + msg196468
2013-08-29 17:18:21r.david.murraysetnosy: + r.david.murray
messages: + msg196466
2013-08-29 17:14:30r.david.murraysetnosy: + pitrou

versions: + Python 3.4, - Python 3.1, Python 3.2
2013-08-29 17:07:33erik.braycreate