Message 196464 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	erik.bray
Recipients	erik.bray
Date	2013-08-29.17:07:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1377796053.23.0.271574141063.issue18876@psf.upfronthosting.co.za>
In-reply-to

Content
I've come across a few difficulties of late with the io module's handling of files opened in append mode (any variation on 'a', 'ab', 'a+', 'ab+', etc. The biggest problem is that the io module does not in any way keep track of whether a file was opened in append mode, and it's essentially impossible to determine the original mode string that was provided by the user. For example: >>> f = open('test', mode='ab+', buffering=0) >>> f <_io.FileIO name='test' mode='rb+'> The 'a' is gone. That doesn't mean the file isn't in append mode. If supported, in fileio_init this still causes the O_APPEND flag to be added to the open() call. But the only way to find out after the fact that the file was actually opened in append mode is with fcntl: >>> fcntl.fcntl(f.fileno(), fnctl.F_GETFL) & os.O_APPEND 1024 but this is hardly easily accessible or portable. So it's possible to have two files open in 'rb+' mode but that have wildly differing behaviors. The only other thing fileio_init does differently with append mode is that it seeks to the end of the file by default. But that does not make the append behavior "portable". If, on a system where O_APPEND was not supported, I seek to a different part of the file and the call write() it will not append to the end of the file. Whereas the behavior of O_APPEND causes an automatic seek to the end before any write(). The fact that no record of the request for 'append' mode is kept leads to further bugs, particularly in BufferedWriter. It doesn't know the raw file was opened with O_APPEND so the writes it shows in the buffer differ from what will actually end up in the file. For example: >>> f = open('test', 'wb') >>> f.write(b'testest') 7 >>> f.close() >>> f = open('test', 'ab+') >>> f.tell() 7 >>> f.write(b'A') 1 >>> f.seek(0) 0 >>> f.read() b'testestA' >>> f.seek(0) 0 >>> f.read(1) b't' >>> f.write(b'B') 1 >>> f.seek(0) 0 >>> f.read() b'tBstestA' >>> f.flush() >>> f.seek(0) 0 >>> f.read() b'testestAB' In this example, I read 1 byte from the beginning of the file, then write one byte. Because of O_APPEND, the effect of the write() call on the raw file is to append, regardless of where BufferedWriter seeks it to first. But before the f.flush() call f.read() just shows what's in the buffer which is not what will actually be written to the file. (Naturally, unbuffered io does not have this particular problem.) So, I'm thinking maybe the fileio struct needs to grow an 'append' member. This could be used to provide a more accurate mode string, and could for example in fileio_write to provide append-like support where it isn't natively supported (though perhaps without any guarantees as to atomicity).

I've come across a few difficulties of late with the io module's handling of files opened in append mode (any variation on 'a', 'ab', 'a+', 'ab+', etc.

The biggest problem is that the io module does not in any way keep track of whether a file was opened in append mode, and it's essentially impossible to determine the original mode string that was provided by the user.  For example:

>>> f = open('test', mode='ab+', buffering=0)
>>> f
<_io.FileIO name='test' mode='rb+'>

The 'a' is gone.  That doesn't mean the file *isn't* in append mode.  If supported, in fileio_init this still causes the O_APPEND flag to be added to the open() call.  But the *only* way to find out after the fact that the file  was actually opened in append mode is with fcntl:

>>> fcntl.fcntl(f.fileno(), fnctl.F_GETFL) & os.O_APPEND
1024

but this is hardly easily accessible or portable.  So it's possible to have two files open in 'rb+' mode but that have wildly differing behaviors.

The only other thing fileio_init does differently with append mode is that it seeks to the end of the file by default.  But that does not make the append behavior "portable".  If, on a system where O_APPEND was not supported, I seek to a different part of the file and the call write() it will *not* append to the end of the file.  Whereas the behavior of O_APPEND causes an automatic seek to the end before any write().

The fact that no record of the request for 'append' mode is kept leads to further bugs, particularly in BufferedWriter.  It doesn't know the raw file was opened with O_APPEND so the writes it shows in the buffer differ from what will actually end up in the file.  For example:

>>> f = open('test', 'wb')
>>> f.write(b'testest')
7
>>> f.close()
>>> f = open('test', 'ab+')
>>> f.tell()
7
>>> f.write(b'A')
1
>>> f.seek(0)
0
>>> f.read()
b'testestA'
>>> f.seek(0)
0
>>> f.read(1)
b't'
>>> f.write(b'B')
1
>>> f.seek(0)
0
>>> f.read()
b'tBstestA'
>>> f.flush()
>>> f.seek(0)
0
>>> f.read()
b'testestAB'

In this example, I read 1 byte from the beginning of the file, then write one byte.  Because of O_APPEND, the effect of the write() call on the raw file is to append, regardless of where BufferedWriter seeks it to first.  But before the f.flush() call f.read() just shows what's in the buffer which is not what will actually be written to the file.  (Naturally, unbuffered io does not have this particular problem.)

So, I'm thinking maybe the fileio struct needs to grow an 'append' member.  This could be used to provide a more accurate mode string, and could for example in fileio_write to provide append-like support where it isn't natively supported (though perhaps without any guarantees as to atomicity).

History
Date	User	Action	Args
2013-08-29 17:07:33	erik.bray	set	recipients: + erik.bray
2013-08-29 17:07:33	erik.bray	set	messageid: <1377796053.23.0.271574141063.issue18876@psf.upfronthosting.co.za>
2013-08-29 17:07:33	erik.bray	link	issue18876 messages
2013-08-29 17:07:32	erik.bray	create