Message 296482 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	direprobs
Recipients	direprobs
Date	2017-06-20.20:18:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1497989929.05.0.713607879185.issue30718@psf.upfronthosting.co.za>
In-reply-to

Content
This behavior was tested on a Linux system with Python 3.5 and 3.6 Passing the buffer size for the builtin function `open` has no effect for files opened in text mode: >>> sys.version '3.5.3 (default, Jan 19 2017, 14:11:04) \n[GCC 6.3.0 20170118]' >>> f = open("/home/user/Desktop/data.txt", "r+", buffering=30) >>> f.write("A" 40) 40 My assumption is that, `f` is a text buffer and f.buffer is the binary buffer. Therefore, the buffering argument to `open` sets the buffering size to the binary buffer f.buffer. Confusingly, f.write("A" * 40) didn't fill the buffer although the 40 ASCII chars=40 bytes have been written to `f` which exceeds its buffer size (30 bytes) nothing was flushed by Python and (instead) the data set in `f` object. The problem is that, it seems that `f` acts as a text buffer with its own buffer size and its own flushing behavior which obstructs many concepts. Here are the main points: A) Despite passing the buffer size to open, `f` object acts as a text buffer and its size is set to f._CHUNK_SIZE. B) The default buffer size set to `f` by default renders the `buffering` argument to `open` virtually useless, this is because the programmer might think that Python flushes the data according to the binary buffer size passed to `open`. That is, when the programmer codes something like: f = open("/home/user/Desktop/data.txt", "r+", buffering=30) f.write("A" * 40) for a file opened by `open`, the programmer's assumption would most likely be that Python flushes the buffer when it's greater than 30 bytes in size for text files. But it really has another buffer on top of the binary buffer and the buffering argument sets the buffer size of the binary buffer `f.buffer` not `f`, the text buffer and `f` relies on the buffer size as set by default that can be seen through f._CHUNK_SIZE or from io.DEFAULT_BUFFER_SIZE. C) Calling f.flush flushes both buffers (f and f.buffer) all the way to f.buffer.raw and this further validates the point that given the buffering argument for text files, would technically be useless. From Python Documentation for `open`: "buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: ..." "and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer." if this behavior was intentional in the implementation of Python, then I think the documentation should say something like this: and an integer > 1 sets the the default buffer size.

*This behavior was tested on a Linux system with Python 3.5 and 3.6

Passing the buffer size for the builtin function `open` has no effect for files opened in text mode:

 >>> sys.version
'3.5.3 (default, Jan 19 2017, 14:11:04) \n[GCC 6.3.0 20170118]'

>>> f = open("/home/user/Desktop/data.txt", "r+", buffering=30)
>>> f.write("A" * 40)
40

My assumption is that, `f` is a text buffer and f.buffer is the binary buffer. Therefore, the buffering argument to `open` sets the buffering size to the binary buffer f.buffer. Confusingly, f.write("A" * 40) didn't fill the buffer although the 40 ASCII chars=40 bytes have been written to `f` which exceeds its buffer size (30 bytes) nothing was flushed by Python and (instead) the data set in `f` object. 


The problem is that, it seems that `f` acts as a text buffer with its own buffer size and its own flushing behavior which obstructs many concepts. Here are the main points: 

A) Despite passing the buffer size to open, `f` object acts as a text buffer and its size is set to f._CHUNK_SIZE. 

B) The default buffer size set to `f` by default renders the `buffering` argument to `open` virtually useless, this is because the programmer might think that Python flushes the data according to the binary buffer size passed to `open`. That is, when the programmer codes something like: 

f = open("/home/user/Desktop/data.txt", "r+", buffering=30)
f.write("A" * 40) 

for a file opened by `open`, the programmer's assumption would most likely be that Python flushes the buffer when it's greater than 30 bytes in size for text files. But  it really has another buffer on top of the binary buffer and the buffering argument sets the buffer size of the binary buffer `f.buffer` *not* `f`, the text buffer and `f` relies on the buffer size as set by default that can be seen through f._CHUNK_SIZE or from io.DEFAULT_BUFFER_SIZE. 

C) Calling f.flush flushes both buffers (f and f.buffer) all the way to f.buffer.raw and this further validates the point that given the buffering argument for text files, would technically be useless.

From Python Documentation for `open`: 

"buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows: ..." 

"and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer." if this behavior was intentional in the implementation of Python, then I think the documentation should  say something like this: 

and an integer > 1 sets the the default buffer size.

History
Date	User	Action	Args
2017-06-20 20:18:49	direprobs	set	recipients: + direprobs
2017-06-20 20:18:49	direprobs	set	messageid: <1497989929.05.0.713607879185.issue30718@psf.upfronthosting.co.za>
2017-06-20 20:18:48	direprobs	link	issue30718 messages
2017-06-20 20:18:47	direprobs	create