Issue 7955: TextIOWrapper Buffering Inconsistent Between _io and _pyio

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/52203

classification

Title:	TextIOWrapper Buffering Inconsistent Between _io and _pyio
Type:	behavior	Stage:
Components:	Documentation, IO	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	amaury.forgeotdarc, amcnabb, georg.brandl, pitrou
Priority:	normal	Keywords:

Created on 2010-02-18 06:41 by amcnabb, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
testpyio.py	amcnabb, 2010-02-18 06:41
testio.py	amcnabb, 2010-02-18 06:41

Messages (5)
msg99496 - (view)	Author: Andrew McNabb (amcnabb)	Date: 2010-02-18 06:41
The following snippet behaves differently in the C IO implementation than in the Python IO implementation: import sys sys.stdout.write('unicode ') sys.stdout.buffer.write(b'bytes ') To test this, I have created two scripts, testpyio.py (using _pyio) and testio.py (using _io). The output is as follows: % python3 testpyio.py unicode bytes % python3 testio.py bytes unicode % In my opinion, the behavior exhibited by _pyio is more correct. It appears that to get the C implementation to print the lines in the correct order, there must be a flush in between the statements. This extra flush would create a lot of overhead. I am attaching the two test scripts. The C implementation prints the output in the correct order if each write ends with a newline.
msg99497 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-02-18 09:33
This is by design, for performance the C TextIOWrapper stores the encoded strings in a list, and calls buffer.write() less often. You may try to add stdout._CHUNK_SIZE = 1 and get the _pyio behavior.
msg99519 - (view)	Author: Andrew McNabb (amcnabb)	Date: 2010-02-18 19:17
This seems like a common need (particularly for stdout and stderr), and setting `stdout._CHUNK_SIZE = 1` is relying on an implementation detail. 1) Can the documentation for TextIOWrapper be updated to clearly describe this extra buffering (how often buffer.write is called, etc.)? 2) Can there be a flush-like method, say write_to_buffer() to force a buffer.write() without the overhead of a flush?
msg99520 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-02-18 19:23
I agree this deserves documentation. I'm not convinced it's a common need, though. Usually you either use stdin/stdout in binary mode or in text mode, but you don't interleave both quite frequently.
msg99521 - (view)	Author: Andrew McNabb (amcnabb)	Date: 2010-02-18 19:42
I would imagine that this would come up in most programs that read data from a pipe or from a socket (which are binary data) and then output to stdout or stderr. I ran across the problem in my first non-trivial port to Python 3, and it seems like a common case to me. But having the weird behavior documented is the most important thing.

History
Date	User	Action	Args
2022-04-11 14:56:57	admin	set	github: 52203
2020-11-16 21:51:41	iritkatriel	set	versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.6, Python 3.1, Python 2.7, Python 3.2
2010-10-29 10:07:21	admin	set	assignee: georg.brandl -> docs@python
2010-02-18 19:42:37	amcnabb	set	messages: + msg99521
2010-02-18 19:23:20	pitrou	set	assignee: georg.brandl components: + Documentation versions: + Python 2.6, Python 2.7, Python 3.2 nosy: + georg.brandl messages: + msg99520 resolution: not a bug -> stage: resolved ->
2010-02-18 19:17:01	amcnabb	set	status: closed -> open messages: + msg99519
2010-02-18 14:02:59	r.david.murray	set	status: open -> closed priority: normal resolution: not a bug stage: resolved
2010-02-18 09:33:26	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc, pitrou messages: + msg99497
2010-02-18 06:41:38	amcnabb	set	files: + testio.py
2010-02-18 06:41:15	amcnabb	create