Issue45617
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-10-26 19:20 by kbrazil, last changed 2022-04-11 14:59 by admin.
Messages (7) | |||
---|---|---|---|
msg405057 - (view) | Author: Kelly Brazil (kbrazil) | Date: 2021-10-26 19:20 | |
When iterating on sys.stdin lines, '\r\n' and '\n' are handled correctly, but '\r' is not handled, though it is documented that it should be supported. Example code: import sys for line in sys.stdin: print(repr(line)) Results in Python 3.8.9: $ echo -e 'line1\nline2\nline3' | python3 linetest.py 'line1\n' 'line2\n' 'line3\n' $ echo -e 'line1\r\nline2\r\nline3' | python3 linetest.py 'line1\r\n' 'line2\r\n' 'line3\n' $ echo -e 'line1\rline2\rline3' | python3 linetest.py 'line1\rline2\rline3\n' |
|||
msg405067 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-10-27 03:02 | |
> '\r' is not handled, though it is documented that it > should be supported. Where is it documented that sys.stdin uses universal newlines mode? The newline behavior isn't documented in the sys module [1], though it should be. It's hard coded in create_stdio() in Python/pylifecycle.c [2]. In Windows it uses universal-translated mode. On all other platforms, it uses "\n", which includes "\r\n" but not "\r". --- [1] https://docs.python.org/3/library/sys.html#sys.stdin [2] https://github.com/python/cpython/blob/v3.10.0/Python/pylifecycle.c#L2216 |
|||
msg405103 - (view) | Author: Kelly Brazil (kbrazil) | Date: 2021-10-27 15:19 | |
'\r' support is implicitly documented under the sys.stdin section[0]: "These streams are regular text files like those returned by the open() function. Their parameters are chosen as follows..." By following the link to the open()[1] docs, it says: "newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows: When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated." When inspecting a newly created sys.stdin object I see that it creates an instance of _io.TextIOWrapper and its newlines attribute is set to None: >>> sys.stdin <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'> >>> print(sys.stdin.newlines) None Note: an oddity here is that the attribute name is newlines instead of newline. Interestingly, when opening STDIN directly it seems to work fine: import sys for line in open(0, sys.stdin.mode): print(repr(line)) Result: $ echo -e 'line1\rline2\rline3' | python3 linetest.py 'line1\n' 'line2\n' 'line3\n' So, perhaps the sys.stdin documentation should be updated to reflect this exception or it could be considered a bug to make its behavior consistent? [0]https://docs.python.org/3/library/sys.html#sys.stdin [1]https://docs.python.org/3/library/functions.html#open |
|||
msg405111 - (view) | Author: Eryk Sun (eryksun) * | Date: 2021-10-27 16:43 | |
> like those returned by the open() function. Their parameters are > chosen as follows..." The `newline` argument for sys.std* isn't documented, but it should be. It happens to be newline='\n' on every platform except Windows. > its newlines attribute is set to None The `newlines` attribute is based on the `newlines` attribute of the incremental decoder. It defaults to None if the decoder lacks this attribute. AFAIK, only the universal newlines decoder, io.IncrementalNewlineDecoder, implements this attribute. If it's not None, the value is a string or tuple that tracks the types of newlines that have been seen thus far. For example: >>> fdr, fdw = os.pipe() >>> f = open(fdr, 'r', newline=None, closefd=False) >>> os.write(fdw, b'a\r\n') 3 >>> f.readline() 'a\n' >>> f.newlines '\r\n' >>> os.write(fdw, b'a\n') 2 >>> f.readline() 'a\n' >>> f.newlines ('\n', '\r\n') |
|||
msg405113 - (view) | Author: Kelly Brazil (kbrazil) | Date: 2021-10-27 16:53 | |
Also, I believe this docstring is being inherited, but this is also where it seems that '\r' is documented to work with sys.stdin: >>> print(sys.stdin.__doc__) Character and line based layer over a BufferedIOBase object, buffer. encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to locale.getpreferredencoding(False). errors determines the strictness of encoding and decoding (see help(codecs.Codec) or the documentation for codecs.register) and defaults to "strict". newline controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n'. It works as follows: * On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated. * On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string. If line_buffering is True, a call to flush is implied when a call to write contains a newline character. I understand that sys.stdin is slightly different than an actual file being opened in text mode, but the documentation seems to suggest that it works pretty much the same. Though, in practice there is a slight difference in behavior. |
|||
msg405381 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2021-10-30 17:58 | |
> '\r' support is implicitly documented under the sys.stdin section[0]: > "These streams are regular text files like those returned by the open() function" I read that to mean that the streams are file-like objects (or TextIOWrappers) without using that term, not to mean that they behave 100% similarly. |
|||
msg405459 - (view) | Author: Kelly Brazil (kbrazil) | Date: 2021-11-01 18:00 | |
Are there other scenarios where splitlines behavior deviates from the default of newline=None (Universal Newlines)? It seems sys.stdin (on non-Windows OS) is the outlier. All of these use Universal Newlines: - sys.stdin (on Windows) - open(0, 'r') - str.splitlines() For sake of consistency it seems that sys.stdin on non-Windows should use the Universal Newlines behavior. Since the difference in behavior is not documented, it is safe to assume users can be confused by this difference. Also, unless there is a technical reason for the difference, I'm not sure what the rationale would be to keep the behavior different. All types of data can be piped to STDIN on non-Windows systems. Just because the pipeline is happening on unix/linux doesn't mean the data inside conforms to \n newlines. I believe Universal Newlines should be the default (as with the other scenarios) and the user should be able to decide if another newline option should be configured. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:51 | admin | set | github: 89780 |
2021-11-01 18:00:35 | kbrazil | set | messages: + msg405459 |
2021-10-30 17:58:24 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg405381 |
2021-10-27 16:53:25 | kbrazil | set | messages: + msg405113 |
2021-10-27 16:43:26 | eryksun | set | messages: + msg405111 |
2021-10-27 15:19:30 | kbrazil | set | messages: + msg405103 |
2021-10-27 03:02:04 | eryksun | set | nosy:
+ eryksun messages: + msg405067 |
2021-10-26 21:01:16 | kbrazil | set | components: + Library (Lib) |
2021-10-26 19:20:29 | kbrazil | create |