classification
Title: io.TextIOWrapper ignores silently partial write if buffer is unbuffered
Type: behavior Stage:
Components: IO Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, methane, mjacob, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2020-07-06 17:33 by mjacob, last changed 2020-07-08 10:18 by eryksun.

Messages (8)
msg373151 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-07-06 17:33
Without unbuffered mode, it works as expected:

% python -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c        
4294967296

% python -c "import sys; print('x'*4294967296)" | wc -c 
4294967297

With unbuffered mode, writes get truncated to 2147479552 bytes on my Linux machine:

% python -u -c "import sys; sys.stdout.write('x'*4294967296)" | wc -c           
2147479552

% python -u -c "import sys; print('x'*4294967296)" | wc -c 
2147479553

I didn’t try, but it’s probably an even bigger problem on Windows, where writes might be limited to 32767 bytes: https://github.com/python/cpython/blob/v3.9.0b4/Python/fileutils.c#L1585

Without unbuffered mode, `sys.stdout.buffer` is a `io.BufferedWriter` object.

% python -c 'import sys; print(sys.stdout.buffer)'
<_io.BufferedWriter name='<stdout>'>

With unbuffered mode, `sys.stdout.buffer` is a `io.FileIO` object.

% python -u -c 'import sys; print(sys.stdout.buffer)' 
<_io.FileIO name='<stdout>' mode='wb' closefd=False>

`io.BufferedWriter` implements the `io.BufferedIOBase` interface. `io.BufferedIOBase.write()` is documented to write all passed bytes. `io.FileIO` implements the `io.RawIOBase` interface. `io.RawIOBase.write()` is documented to be able to write less bytes than passed.

`io.TextIOWrapper.write()` is not documented to write all characters it has been passed, but e.g. `print()` relies on that.

To fix the problem, it has to be ensured that either
* `sys.stdout.buffer` is an object that guarantees that all bytes passed to its `write()` method are written (e.g. deriving from `io.BufferedIOBase`), or
* `io.TextIOWrapper` calls the `write()` method of its underlying binary stream until all bytes have been written, or
* users of `io.TextIOWrapper` call `write()` until all characters have been written.

In the first two possibilities it probably makes sense to tighten the contract of `io.TextIOBase.write` to guarantee that all passed characters are written.
msg373167 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-07-06 20:02
2147479552 is the 0x7ffff000 bytes limit documented for write() on Linux (source: https://man7.org/linux/man-pages/man2/write.2.html). The limit could be even smaller in other circumstances or other systems.

I’m adding Victor Stinner to the nosy list, as he added the code limiting writes to the console on Windows in e0daff1c61e323d2a39dd8241de67082d1f10fd7, and he might have an opinion on the topic.
msg373203 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-07-07 03:05
`io.TextIOWrapper.write()` returns the length of the passed string instead of the actually written number of characters.

% python -u -c "import sys; print(sys.stdout.write('x'*4294967296), file=sys.stderr)" | wc -c 
4294967296
2147479552

So the possibility "users of `io.TextIOWrapper` call `write()` until all characters have been written" would not be sufficient.
msg373207 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-07-07 06:14
It’s possible to trigger the problem on Unix with much smaller sizes, e.g. by interrupting the write() with a signal handler (even if the signal handler doesn’t do anything). The following script starts a subprocess doing a 16MiB write and sends a signal, which is handled but is a no-op, after reading a bit from the pipe:

import signal
import subprocess
import sys

CHILD_PROCESS = '''
import signal, sys
signal.signal(signal.SIGINT, lambda *x: None)
written = sys.stdout.write('x' * 16777216)
print('written:', written, file=sys.stderr, flush=True)
'''

proc = subprocess.Popen(
    [sys.executable, '-u', '-c', CHILD_PROCESS],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)
read = len(proc.stdout.read(1))
proc.send_signal(signal.SIGINT)
read += len(proc.stdout.read())
stdout, stderr = proc.communicate()
assert stdout == b''
print('stderr:', stderr)
assert read == 16777216, "read: {}".format(read)


% python3 test_interrupted_write.py
stderr: b'written: 16777216\n'
Traceback (most recent call last):
  File "test_interrupted_write.py", line 24, in <module>
    assert read == 16777216, "read: {}".format(read)
AssertionError: read: 69632

If I remove the '-u' that gets passed to the subprocess:

% python3 test_interrupted_write.py
stderr: b'written: 16777216\n'
msg373216 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-07 10:35
cc Antoine Pitrou who was involved in io module design.

Currently, the io.TextIOWrapper implementation doesn't handle partial write: it doesn't fully support an unbuffered 'buffer' object.

It should either handle partial write internally, or it should inject a buffered writer between itself (TextIOWrapper) and the unbuffered buffer so handling partial writes who be handled by the buffered writer.

The socket.socket class has a sendall() method which helps to handle such problem. In the io module, sometimes write() can do a partial write (unbuffered writer like FileIO), sometimes it doesn't (buffered writer like BufferedWriter).

== C implementation ==

Modules/_io/text.c. The _io_TextIOWrapper_write_impl() function puts the encoded string into an internal pending_bytes list. If needed, it calls flush(): _textiowrapper_writeflush().

The pseudo-code of _textiowrapper_writeflush() is to call "self.buffer.write(b)" where b is made of all "pending bytes". write() result is ignored: partial write is silently ignored.

== Python implementation ==

_pyio.TextIOWrapper.write() simply calls: "self.buffer.write(b)". It ignores partial write silently.
msg373287 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-07-08 08:29
Oh, this is a serious problem.

AFAIK TextIOWrapper initially supported only buffered writers, but the support of non-buffered writers was added later. We can make TextIOWrapper calling write() of underlying binary stream repeatedly, but it will break the code which uses TextIOWrapper with other file-like objects whose write() method does not return the number of written bytes. For example:

    buffer = []
    class Writer: write = buffer.append
    t = TextIOWrapper(Writer())

Even if we fix writing sys.stdout and sys.stderr there will be a problem with programs which write directly to sys.stdout.buffer or use open(buffering=0).

This is a complex issue and it needs a complex solution.
msg373289 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-08 08:32
> but it will break the code which uses TextIOWrapper with other file-like objects whose write() method does not return the number of written bytes.

We can detect if write() doesn't return an integer and don't attempt to call write() in a loop (until all bytes are written) in this case.
msg373300 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-07-08 10:18
> it’s probably an even bigger problem on Windows, where writes might be
> limited to 32767 bytes:

This check and workaround in _Py_write_impl doesn't affect disk files and pipes. It only affects character devices for which isatty is true, and really it's only concerned with console screen-buffer files. The workaround is not required in Windows 8.1+, so it can be removed in Python 3.9+.
History
Date User Action Args
2020-07-08 10:18:50eryksunsetnosy: + eryksun
messages: + msg373300
2020-07-08 08:32:24vstinnersetmessages: + msg373289
2020-07-08 08:29:17serhiy.storchakasetmessages: + msg373287
2020-07-07 10:36:06vstinnersetnosy: + methane, serhiy.storchaka
2020-07-07 10:35:55vstinnersetnosy: + pitrou

messages: + msg373216
title: Output of print() might get truncated in unbuffered mode -> io.TextIOWrapper ignores silently partial write if buffer is unbuffered
2020-07-07 06:14:11mjacobsetmessages: + msg373207
2020-07-07 03:05:32mjacobsetmessages: + msg373203
2020-07-06 20:02:26mjacobsetnosy: + vstinner
messages: + msg373167
2020-07-06 17:33:53mjacobcreate