classification
Title: open builtin function: specifying the size of buffer has no effect for text files
Type: behavior Stage:
Components: Interpreter Core, IO Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: direprobs, izbyshev
Priority: normal Keywords:

Created on 2017-06-20 20:18 by direprobs, last changed 2017-12-07 00:13 by izbyshev.

Messages (2)
msg296482 - (view) Author: Mansoor Ahmed (direprobs) Date: 2017-06-20 20:18
*This behavior was tested on a Linux system with Python 3.5 and 3.6

Passing the buffer size for the builtin function `open` has no effect for files opened in text mode:

 >>> sys.version
'3.5.3 (default, Jan 19 2017, 14:11:04) \n[GCC 6.3.0 20170118]'

>>> f = open("/home/user/Desktop/data.txt", "r+", buffering=30)
>>> f.write("A" * 40)
40

My assumption is that, `f` is a text buffer and f.buffer is the binary buffer. Therefore, the buffering argument to `open` sets the buffering size to the binary buffer f.buffer. Confusingly, f.write("A" * 40) didn't fill the buffer although the 40 ASCII chars=40 bytes have been written to `f` which exceeds its buffer size (30 bytes) nothing was flushed by Python and (instead) the data set in `f` object. 


The problem is that, it seems that `f` acts as a text buffer with its own buffer size and its own flushing behavior which obstructs many concepts. Here are the main points: 

A) Despite passing the buffer size to open, `f` object acts as a text buffer and its size is set to f._CHUNK_SIZE. 

B) The default buffer size set to `f` by default renders the `buffering` argument to `open` virtually useless, this is because the programmer might think that Python flushes the data according to the binary buffer size passed to `open`. That is, when the programmer codes something like: 

f = open("/home/user/Desktop/data.txt", "r+", buffering=30)
f.write("A" * 40) 

for a file opened by `open`, the programmer's assumption would most likely be that Python flushes the buffer when it's greater than 30 bytes in size for text files. But  it really has another buffer on top of the binary buffer and the buffering argument sets the buffer size of the binary buffer `f.buffer` *not* `f`, the text buffer and `f` relies on the buffer size as set by default that can be seen through f._CHUNK_SIZE or from io.DEFAULT_BUFFER_SIZE. 

C) Calling f.flush flushes both buffers (f and f.buffer) all the way to f.buffer.raw and this further validates the point that given the buffering argument for text files, would technically be useless.

From Python Documentation for `open`: 

"buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: ..." 

"and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer." if this behavior was intentional in the implementation of Python, then I think the documentation should  say something like this: 

and an integer > 1 sets the the default buffer size.
msg307779 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2017-12-07 00:13
Yes, clarifying buffering for text mode in open() would be nice.

@direprobs: just in case you didn't know, you can achieve what you want with something like the following in pre-3.7:

with open("/dev/null", "wb", buffering=10) as outb, \
        io.TextIOWrapper(outb, write_through=True) as out:
    out.write("x" * 20)

Sadly, write_through can't be passed to open(), but it can be changed on existing TextIOWrapper since 3.7 (via new reconfigure() method).
History
Date User Action Args
2017-12-07 00:13:20izbyshevsetnosy: + izbyshev
messages: + msg307779
2017-06-20 20:18:49direprobscreate