classification
Title: open() shouldn't silently ignore buffering=1 in binary mode
Type: behavior Stage:
Components: IO Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, izbyshev, pitrou, stutzbach
Priority: normal Keywords:

Created on 2017-12-06 23:46 by izbyshev, last changed 2017-12-10 23:01 by izbyshev.

Messages (3)
msg307775 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2017-12-06 23:46
The fact that "buffering=1" is usable only in text mode is documented for open(). In binary mode, setting buffering to 1 is silently ignored and equivalent to using default buffer size. I argue that such behavior is:

1. Inconsistent
For example, attempts to use buffering=0 in text mode or to specify encoding in binary mode raise ValueError.

2. Dangerous
Consider the following code fragment running on *nix (inspired by real-world code):

with open("fifo", "wb", buffering=1) as out:
    for line in lines:
        out.write(line)

"fifo" refers to a FIFO (named pipe). For each line, len(line) <= PIPE_BUF. Because of line buffering, such code must be able to safely assume that each write() is atomic, so that even in case of multiple writers no line read by a FIFO reader will ever get intermixed with another line. And it's so in Python 2. After migration to Python 3 (assuming that line is now bytes), this code still works on Linux because of two factors:
a) PIPE_BUF is 4096, and so is the default IO buffer size (usually)
b) When a write() is going to overflow the buffer, BufferedWriter first flushes and then processes the new write() (based on my experiment). So, each OS write() is called with complete lines only and is atomic per (a).

But PIPE_BUF is 512 on Mac OS X (I don't know about default buffer size). So, we are likely to have a subtle 2-to-3 migration issue.

I suggest to raise a ValueError if buffering=1 is used for binary mode. Note that issue 10344 (msg244354) and issue 21332 would have been uncovered earlier if it was done from the beginning.
msg307968 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-10 18:44
> I suggest to raise a ValueError if buffering=1 is used for binary mode.

Either that, or we instead accept buffering=1 as a regular buffer size.  But since buffering=1 means something else for text mode, maybe you're right that it's better to raise in binary mode, to avoid any possible confusion.
msg308000 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2017-12-10 23:01
I'm in favor of raising an exception because it'll expose existing code with incorrect assumptions. I'll check whether it breaks any tests and submit a PR.
History
Date User Action Args
2017-12-10 23:01:09izbyshevsetmessages: + msg308000
2017-12-10 18:44:01pitrousetnosy: + benjamin.peterson, stutzbach
messages: + msg307968
2017-12-06 23:46:49izbyshevcreate