Message 327094 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nascheme
Recipients	nascheme, skip.montanaro, terry.reedy, vmax
Date	2018-10-04.22:33:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538692420.85.0.545547206417.issue30825@psf.upfronthosting.co.za>
In-reply-to

Content
There is another issue related to this. If you use codecs to get a reader, it uses str.splitlines() internally, which treats a bunch of different characters as line terminators. See issue #18291 and: https://docs.python.org/3.8/library/stdtypes.html#str.splitlines I was thinking about different ways to fix this. First, the csv module suggests you pass newline='' to the file object. I suspect most people don't know to do that. So, I thought maybe the csv module should inspect the file object that gets passed in and then warn if newline='' has not been used or if the file is a codecs reader object. However, that seems fairly complicated. Would it be better if we changed the 'csv' module to do its own line splitting? I think that would be better although I'm not sure about backwards compatibly. Currently, the reader expects to call iter() on the input file. Would it be okay if it used the 'read' method of it in preference to using iter()? It could still fallback to iter() if there was no read method.

There is another issue related to this.  If you use codecs to get a reader, it uses str.splitlines() internally, which treats a bunch of different characters as line terminators.  See issue #18291 and:

https://docs.python.org/3.8/library/stdtypes.html#str.splitlines

I was thinking about different ways to fix this.  First, the csv module suggests you pass newline='' to the file object.  I suspect most people don't know to do that.  So, I thought maybe the csv module should inspect the file object that gets passed in and then warn if newline='' has not been used or if the file is a codecs reader object.

However, that seems fairly complicated.  Would it be better if we changed the 'csv' module to do its own line splitting?  I think that would be better although I'm not sure about backwards compatibly.  Currently, the reader expects to call iter() on the input file.  Would it be okay if it used the 'read' method of it in preference to using iter()?  It could still fallback to iter() if there was no read method.

History
Date	User	Action	Args
2018-10-04 22:33:40	nascheme	set	recipients: + nascheme, skip.montanaro, terry.reedy, vmax
2018-10-04 22:33:40	nascheme	set	messageid: <1538692420.85.0.545547206417.issue30825@psf.upfronthosting.co.za>
2018-10-04 22:33:40	nascheme	link	issue30825 messages
2018-10-04 22:33:40	nascheme	create