This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sys.stdin does not iterate correctly on '\r' line separator
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, eryksun, kbrazil
Priority: normal Keywords:

Created on 2021-10-26 19:20 by kbrazil, last changed 2022-04-11 14:59 by admin.

Messages (7)
msg405057 - (view) Author: Kelly Brazil (kbrazil) Date: 2021-10-26 19:20
When iterating on sys.stdin lines, '\r\n' and '\n' are handled correctly, but '\r' is not handled, though it is documented that it should be supported.

Example code:
import sys

for line in sys.stdin:
    print(repr(line))

Results in Python 3.8.9:
$ echo -e 'line1\nline2\nline3' | python3 linetest.py 
'line1\n'
'line2\n'
'line3\n'

$ echo -e 'line1\r\nline2\r\nline3' | python3 linetest.py 
'line1\r\n'
'line2\r\n'
'line3\n'

$ echo -e 'line1\rline2\rline3' | python3 linetest.py 
'line1\rline2\rline3\n'
msg405067 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-10-27 03:02
> '\r' is not handled, though it is documented that it 
> should be supported.

Where is it documented that sys.stdin uses universal newlines mode? The newline behavior isn't documented in the sys module [1], though it should be. It's hard coded in create_stdio() in Python/pylifecycle.c [2]. In Windows it uses universal-translated mode. On all other platforms, it uses "\n", which includes "\r\n" but not "\r".

---
[1] https://docs.python.org/3/library/sys.html#sys.stdin
[2] https://github.com/python/cpython/blob/v3.10.0/Python/pylifecycle.c#L2216
msg405103 - (view) Author: Kelly Brazil (kbrazil) Date: 2021-10-27 15:19
'\r' support is implicitly documented under the sys.stdin section[0]:

"These streams are regular text files like those returned by the open() function. Their parameters are chosen as follows..."

By following the link to the open()[1] docs, it says:

"newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated."

When inspecting a newly created sys.stdin object I see that it creates an instance of _io.TextIOWrapper and its newlines attribute is set to None:

>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>
>>> print(sys.stdin.newlines)
None

Note: an oddity here is that the attribute name is newlines instead of newline.

Interestingly, when opening STDIN directly it seems to work fine:

import sys
for line in open(0, sys.stdin.mode):
    print(repr(line))

Result:
$ echo -e 'line1\rline2\rline3' | python3 linetest.py 
'line1\n'
'line2\n'
'line3\n'

So, perhaps the sys.stdin documentation should be updated to reflect this exception or it could be considered a bug to make its behavior consistent?

[0]https://docs.python.org/3/library/sys.html#sys.stdin
[1]https://docs.python.org/3/library/functions.html#open
msg405111 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-10-27 16:43
> like those returned by the open() function. Their parameters are 
> chosen as follows..."

The `newline` argument for sys.std* isn't documented, but it should be. It happens to be newline='\n' on every platform except Windows.

> its newlines attribute is set to None

The `newlines` attribute is based on the `newlines` attribute of the incremental decoder. It defaults to None if the decoder lacks this attribute. AFAIK, only the universal newlines decoder, io.IncrementalNewlineDecoder, implements this attribute. If it's not None, the value is a string or tuple that tracks the types of newlines that have been seen thus far. For example:

    >>> fdr, fdw = os.pipe()
    >>> f = open(fdr, 'r', newline=None, closefd=False)
    >>> os.write(fdw, b'a\r\n')
    3
    >>> f.readline()
    'a\n'
    >>> f.newlines
    '\r\n'
    >>> os.write(fdw, b'a\n')
    2
    >>> f.readline()
    'a\n'
    >>> f.newlines
    ('\n', '\r\n')
msg405113 - (view) Author: Kelly Brazil (kbrazil) Date: 2021-10-27 16:53
Also, I believe this docstring is being inherited, but this is also where it seems that '\r' is documented to work with sys.stdin:

>>> print(sys.stdin.__doc__)
Character and line based layer over a BufferedIOBase object, buffer.

encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).

errors determines the strictness of encoding and decoding (see
help(codecs.Codec) or the documentation for codecs.register) and
defaults to "strict".

newline controls how line endings are handled. It can be None, '',
'\n', '\r', and '\r\n'.  It works as follows:

* On input, if newline is None, universal newlines mode is
  enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
  these are translated into '\n' before being returned to the
  caller. If it is '', universal newline mode is enabled, but line
  endings are returned to the caller untranslated. If it has any of
  the other legal values, input lines are only terminated by the given
  string, and the line ending is returned to the caller untranslated.

* On output, if newline is None, any '\n' characters written are
  translated to the system default line separator, os.linesep. If
  newline is '' or '\n', no translation takes place. If newline is any
  of the other legal values, any '\n' characters written are translated
  to the given string.

If line_buffering is True, a call to flush is implied when a call to
write contains a newline character.

I understand that sys.stdin is slightly different than an actual file being opened in text mode, but the documentation seems to suggest that it works pretty much the same. Though, in practice there is a slight difference in behavior.
msg405381 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2021-10-30 17:58
> '\r' support is implicitly documented under the sys.stdin section[0]:
> "These streams are regular text files like those returned by the open() function"

I read that to mean that the streams are file-like objects (or TextIOWrappers) without using that term, not to mean that they behave 100% similarly.
msg405459 - (view) Author: Kelly Brazil (kbrazil) Date: 2021-11-01 18:00
Are there other scenarios where splitlines behavior deviates from the default of newline=None (Universal Newlines)? It seems sys.stdin (on non-Windows OS) is the outlier.

All of these use Universal Newlines:
- sys.stdin (on Windows)
- open(0, 'r')
- str.splitlines()

For sake of consistency it seems that sys.stdin on non-Windows should use the Universal Newlines behavior. Since the difference in behavior is not documented, it is safe to assume users can be confused by this difference.

Also, unless there is a technical reason for the difference, I'm not sure what the rationale would be to keep the behavior different. All types of data can be piped to STDIN on non-Windows systems. Just because the pipeline is happening on unix/linux doesn't mean the data inside conforms to \n newlines.

I believe Universal Newlines should be the default (as with the other scenarios) and the user should be able to decide if another newline option should be configured.
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89780
2021-11-01 18:00:35kbrazilsetmessages: + msg405459
2021-10-30 17:58:24eric.araujosetnosy: + eric.araujo
messages: + msg405381
2021-10-27 16:53:25kbrazilsetmessages: + msg405113
2021-10-27 16:43:26eryksunsetmessages: + msg405111
2021-10-27 15:19:30kbrazilsetmessages: + msg405103
2021-10-27 03:02:04eryksunsetnosy: + eryksun
messages: + msg405067
2021-10-26 21:01:16kbrazilsetcomponents: + Library (Lib)
2021-10-26 19:20:29kbrazilcreate