This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python treats ASCII record separator ('\x1e') as a newline
Type: Stage:
Components: Documentation Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: apatrushev, docs@python, martin.panter, terry.reedy, timClicks
Priority: normal Keywords:

Created on 2018-07-28 10:09 by timClicks, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg322537 - (view) Author: Tim McNamara (timClicks) * Date: 2018-07-28 10:09
Hello,

I apologize if this is expected behavior, however it doesn't appear to be documented  haven't.

>>> "single\x1eline\x1estring".splitlines()
['single', 'line', 'string']
msg322538 - (view) Author: Tim McNamara (timClicks) * Date: 2018-07-28 10:14
Hello,

I apologize if this is expected behavior, however it doesn't appear to be documented.

>>> "single\x1eline\x1estring".splitlines()
['single', 'line', 'string']

The glossary refers to the universal newlines as:


> universal newlines
>    A manner of interpreting text streams in which all of the 
>    following are recognized as ending a line: the Unix end-of-line
>    convention '\n', the Windows convention '\r\n', and the old 
>    Macintosh convention '\r'. See PEP 278 and PEP 3116, as well as 
>    bytes.splitlines() for an additional use.
https://docs.python.org/3/glossary.html#term-universal-newlines

According to Wikipedia, pre-POSIX QNX uses `\x1e` as a newline (https://en.wikipedia.org/wiki/Newline#Representation), but I don't think that it should be treated as the default.
msg322607 - (view) Author: Anton Patrushev (apatrushev) Date: 2018-07-29 03:43
0x1e listed as linebreak char in tests:

Lib/test/test_unicodedata.py:317
msg323066 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-08-03 19:47
A database record is equivalent to a logical line, possible wrapped onto multiple physical lines. So it is plausible.

The 7643 in the test name refers to issue #7643, What is a Unicode line break character?"  It contains this:
"
> We may add some words to the documentation for str.splitlines() and bytes.splitlines() to explain what is considered a line break character.

For ASCII we should make the list of characters explicit.
For Unicode, we should mention the above definition and give
the table as example list (the Unicode database may add more
such characters in the future).
"
The test was added but the doc not.  I agree that it would be useful.  Feel free to suggest a doc change.
msg323097 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-08-03 23:55
What documentation were you looking at? I remember adding 0x1E and others to the list in Issue 12855. See <https://docs.python.org/3.5/library/stdtypes.html#str.splitlines>:

‘‘‘
str.splitlines([keepends])
  . . .
  
  This method splits on the following line boundaries. . . .
  
  Representation  Description
  ==============  ===========
  . . .
  \x1e            Record Separator
  . . .
’’’
History
Date User Action Args
2022-04-11 14:59:03adminsetgithub: 78437
2018-08-03 23:55:15martin.pantersetnosy: + martin.panter
messages: + msg323097
2018-08-03 19:47:21terry.reedysetassignee: docs@python

components: + Documentation
title: Python treats ASCII record seperator ('\x1e') as a newline -> Python treats ASCII record separator ('\x1e') as a newline
nosy: + terry.reedy, docs@python
versions: + Python 3.7, Python 3.8, - Python 3.5
messages: + msg323066
2018-07-29 03:43:04apatrushevsetnosy: + apatrushev
messages: + msg322607
2018-07-28 10:14:39timClickssetmessages: + msg322538
title: Python treats ASCII record seperator ('\x1e as a newline -> Python treats ASCII record seperator ('\x1e') as a newline
2018-07-28 10:09:09timClickscreate