classification
Title: TextIOWrapper Buffering Inconsistent Between _io and _pyio
Type: behavior Stage:
Components: Documentation, IO Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: amaury.forgeotdarc, amcnabb, georg.brandl, pitrou
Priority: normal Keywords:

Created on 2010-02-18 06:41 by amcnabb, last changed 2010-10-29 10:07 by admin.

Files
File name Uploaded Description Edit
testpyio.py amcnabb, 2010-02-18 06:41
testio.py amcnabb, 2010-02-18 06:41
Messages (5)
msg99496 - (view) Author: Andrew McNabb (amcnabb) Date: 2010-02-18 06:41
The following snippet behaves differently in the C IO implementation than in the Python IO implementation:

  import sys
  sys.stdout.write('unicode ')
  sys.stdout.buffer.write(b'bytes ')

To test this, I have created two scripts, testpyio.py (using _pyio) and testio.py (using _io).  The output is as follows:

% python3 testpyio.py
unicode bytes
% python3 testio.py
bytes unicode
%

In my opinion, the behavior exhibited by _pyio is more correct.  It appears that to get the C implementation to print the lines in the correct order, there must be a flush in between the statements.  This extra flush would create a lot of overhead.

I am attaching the two test scripts.

The C implementation prints the output in the correct order if each write ends with a newline.
msg99497 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-02-18 09:33
This is by design, for performance the C TextIOWrapper stores the encoded strings in a list, and calls buffer.write() less often.
You may try to add
   stdout._CHUNK_SIZE = 1
and get the _pyio behavior.
msg99519 - (view) Author: Andrew McNabb (amcnabb) Date: 2010-02-18 19:17
This seems like a common need (particularly for stdout and stderr), and setting `stdout._CHUNK_SIZE = 1` is relying on an implementation detail.

1) Can the documentation for TextIOWrapper be updated to clearly describe this extra buffering (how often buffer.write is called, etc.)?

2) Can there be a flush-like method, say write_to_buffer() to force a buffer.write() without the overhead of a flush?
msg99520 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-18 19:23
I agree this deserves documentation. I'm not convinced it's a common need, though. Usually you either use stdin/stdout in binary mode or in text mode, but you don't interleave both quite frequently.
msg99521 - (view) Author: Andrew McNabb (amcnabb) Date: 2010-02-18 19:42
I would imagine that this would come up in most programs that read data from a pipe or from a socket (which are binary data) and then output to stdout or stderr.  I ran across the problem in my first non-trivial port to Python 3, and it seems like a common case to me.

But having the weird behavior documented is the most important thing.
History
Date User Action Args
2010-10-29 10:07:21adminsetassignee: georg.brandl -> docs@python
2010-02-18 19:42:37amcnabbsetmessages: + msg99521
2010-02-18 19:23:20pitrousetassignee: georg.brandl
components: + Documentation
versions: + Python 2.6, Python 2.7, Python 3.2
nosy: + georg.brandl

messages: + msg99520
resolution: not a bug ->
stage: resolved ->
2010-02-18 19:17:01amcnabbsetstatus: closed -> open

messages: + msg99519
2010-02-18 14:02:59r.david.murraysetstatus: open -> closed
priority: normal
resolution: not a bug
stage: resolved
2010-02-18 09:33:26amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, pitrou
messages: + msg99497
2010-02-18 06:41:38amcnabbsetfiles: + testio.py
2010-02-18 06:41:15amcnabbcreate