This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Bizarre StringIO(newline="\r\n") translation
Type: behavior Stage: resolved
Components: Documentation, IO Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: berker.peksag, docs@python, gvanrossum, martin.panter, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-09-15 03:04 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
stringio_newline_2.patch serhiy.storchaka, 2015-09-25 05:56 Half-baked patch review
newline-doc.patch martin.panter, 2015-10-03 09:45 review
Messages (14)
msg226895 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-15 03:04
I noticed that the newline translation in the io.StringIO class does not behave as I would expect:

>>> text = "NL\n" "CRLF\r\n" "CR\r" "EOF"
>>> s = StringIO(text, newline="\r\n")
>>> s.getvalue()
'NL\r\nCRLF\r\r\nCR\rEOF'  # Why is this not just equal to “text”?
>>> tuple(s)
('NL\r\n', 'CRLF\r\r\n', 'CR\rEOF')  # Too many lines, butchered EOL sequence
>>> tuple(TextIOWrapper(BytesIO(text.encode("ascii")), "ascii", newline="\r\n"))
('NL\nCRLF\r\n', 'CR\rEOF')  # This seems more reasonable

Although I have never had a use for newline="\r", it also seems broken:

>>> tuple(StringIO(text, newline="\r"))
('NL\r', 'CRLF\r', '\r', 'CR\r', 'EOF')  # Way too many lines
>>> tuple(TextIOWrapper(BytesIO(text.encode("ascii")), "ascii", newline="\r"))
('NL\nCRLF\r', '\nCR\r', 'EOF')

The other newline options ("\n", "", and None) seem to behave correctly though. There seem to be quite a few bug reports to do with newline translation in StringIO, but I couldn’t see anything specifically about this one. However the issue was mentioned at <https://bugs.python.org/issue20423#msg209581>.

I noticed there are test cases which appear to bless the current behaviour, as seen in the patch for Issue 20498. IMO these tests are wrong.
msg226902 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-15 07:52
See issue20435.
msg251570 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-25 03:52
According to <https://bugs.python.org/issue20423#msg209581> Serhiy and Antoine both seem to agree this behaviour is a bug. The reason why newline="\r\n" and newline="\r" cause these funny translations is that __init__() internally passes the initial buffer value through write(). So I propose to set the buffer directly without using write().

However, doing this would have another effect. In the C implementation, write() also does universal newline decoding. The Python implementation had similar decoding added in getvalue() to compensate (Issue 20435). I propose to revert the getvalue() decoding, and to move the universal decoding from write() to read() in the C implementation.

I anticipate the combined effect of these changes would be:

1. StringIO("\n", newline="\r\n") no longer encodes the newline to CRLF in the internal buffer, so reading or getvalue() will return "\n" unencoded

2. StringIO("\r\n", newline=None).getvalue() returns "\r\n" undecoded, rather than universal decoding changing it to "\n"

3. s = StringIO(newline=None); s.write("\r\n"); s.getvalue() also returns "\r\n" undecoded. It is undocumented, but StringIO intentionally does not encode to os.linesep (yet another bug IMO).

4. StringIO.newlines would only get updated during reading, rather than during construction and writing, since newline decoding will only take place during reading.

There is another bug where the universal newline decoding does not anticipate split CRLF sequences. This would hopefully be fixed at the same time.

>>> s = io.StringIO(newline=None)
>>> s.write("\r\n" "\r")
3
>>> s.write("\n")  # Complete second CRLF
1
>>> s.getvalue()  # Should be "\r\n\r\n", at least when os.linesep == "\n"
'\n\n\n'
msg251575 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-25 05:56
I agree with you, but I think that Antoine disagrees.

My half-baked patch for issue20435 did a half of the work. It initialized the buffer directly in __init__. Here is the rebased version. There is a difference between C and Python implementations for universal newlines and test_newline_none fails.
msg251718 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-09-27 19:04
At this point in time, I don't think it's a good idea to change the semantics at all. Some people might unknowingly rely on the current semantics, and the consequences of a change in 3.6 might be hard to debug.

The larger issue here is that the newline translation layer is meant as an adaptation layer between Python (where a newline is always "\n") and the outside world (where newlines are system-dependent or even file-dependent). But what is the "outside world" with StringIO? The data always comes from and goes to Python. So there is no obviously right decision except, perhaps, the decision not to have newline translation at all.

So I'd just recommend closing this as won't fix.
msg251726 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-09-27 20:56
Agreed we shouldn't change this. It looks like the behavior is consistent if you consider `a = StringIO(stuff, newline=...)` merely a shorthand for `a = StringIO(newline=...); a.write(stuff)`.

I understand you would like to have a way to set the internal buffer directly, without newline translation; to support that we'd have to add a new argument. Although really, it's probably better to just do the \r\n translation in your app anyway.

You're very unlikely to ever need \r translation -- that was last seen on MacOS 9, which has been dead for 14 years now. Fighting \r\n or pretending it's a Windows-only thing is pretty hopeless -- most text-based internet protocols (like HTTP) require it.
msg251727 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-09-27 21:00
Ok, I'm closing, then.
msg251994 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-01 05:19
I understand it may not be worth changing the behaviour. Would you instead accept a change to the documentation to point out that “newline” does _not_ actually work like TextIOWrapper? Or perhaps even deprecating or recommending against using “newline”?
msg252045 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-10-01 18:42
I don't see a reason to deprecate anything. Can you write up in one paragraph how StringIO's newline flag differs from the one to TextIOWrapper? (What happens to the initial value is a separate issue AFAIC.)
msg252204 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-03 09:45
Here is a suggested patch. I did include details of the initializer and getvalue(); this is the heart of the problem IMO. In a limited sense the newline flag _is_ similar to TextIOWrapper, but more broadly this implied to me that newlines should be encoded in the buffer, just like in TextIOWrapper’s wrapped “buffer” and on disk.

My patch also adds to a comment in the C code and removes another comment made out of date by Argument Clinic.

In the documentation I didn’t mention the problem with split CRLFs; I think that is a separate bug.
msg252340 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-10-05 16:38
The patch fails to apply in the 2.7 branch. It works in 3.4. Could you look into the 2.7 issue?
msg252347 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-10-05 17:12
It looks like we don't merge 2.7 into 3.4 any more, so that will have to be a separate patch anyway.

So you can commit the patch to 3.4, merge into 3.5 and 3.6. Good luck! And thanks for your perseverance.
msg252679 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-10 02:39
Thanks for the feedback. Yeah, 2.7 is an independent branch, but I will try porting the changes there.
msg252701 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-10-10 10:31
New changeset 57fc950298bb by Martin Panter in branch '2.7':
Issue #22413: Document newline effect on StringIO initializer and getvalue
https://hg.python.org/cpython/rev/57fc950298bb

New changeset cba4bf2a1721 by Martin Panter in branch '3.4':
Issue #22413: Document newline effect on StringIO initializer and getvalue
https://hg.python.org/cpython/rev/cba4bf2a1721

New changeset 451da3327f68 by Martin Panter in branch '3.5':
Issue #22413: Merge StringIO doc from 3.4 into 3.5
https://hg.python.org/cpython/rev/451da3327f68

New changeset 46df76819b79 by Martin Panter in branch '3.5':
Issue #22413: Remove comment made out of date by Argument Clinic
https://hg.python.org/cpython/rev/46df76819b79

New changeset c12d3f941731 by Martin Panter in branch 'default':
Issue #22413: Merge StringIO doc from 3.5
https://hg.python.org/cpython/rev/c12d3f941731
History
Date User Action Args
2022-04-11 14:58:08adminsetgithub: 66603
2015-10-10 10:37:49martin.pantersetstatus: open -> closed
resolution: wont fix -> fixed
stage: commit review -> resolved
2015-10-10 10:31:23python-devsetnosy: + python-dev
messages: + msg252701
2015-10-10 02:39:45martin.pantersetnosy: + berker.peksag
messages: + msg252679

assignee: docs@python -> martin.panter
stage: patch review -> commit review
2015-10-05 17:12:04gvanrossumsetmessages: + msg252347
2015-10-05 16:38:20gvanrossumsetmessages: + msg252340
2015-10-03 09:45:41martin.pantersetstatus: closed -> open
files: + newline-doc.patch

components: + Documentation
assignee: docs@python
versions: + Python 2.7, Python 3.6
nosy: + docs@python

messages: + msg252204
stage: patch review
2015-10-01 18:42:59gvanrossumsetmessages: + msg252045
2015-10-01 05:19:16martin.pantersetmessages: + msg251994
2015-09-27 21:00:57pitrousetstatus: open -> closed
resolution: wont fix
messages: + msg251727
2015-09-27 20:56:48gvanrossumsetmessages: + msg251726
2015-09-27 19:04:27pitrousetnosy: + gvanrossum
messages: + msg251718
2015-09-25 05:56:23serhiy.storchakasetfiles: + stringio_newline_2.patch
keywords: + patch
messages: + msg251575
2015-09-25 03:52:20martin.pantersetmessages: + msg251570
2014-09-15 07:52:30serhiy.storchakasetnosy: + serhiy.storchaka, pitrou

messages: + msg226902
versions: + Python 3.5
2014-09-15 03:04:03martin.pantercreate