Message 327082 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nascheme
Recipients	nascheme, xtreak
Date	2018-10-04.20:17:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za>
In-reply-to

Content
Thank you for the research. The problem is indeed that \v is getting treated as a line separator. That is an intentional design choice, see: https://bugs.python.org/issue12855 It would seem to have some surprising implications for CSV parsing. E.g. if someone embeds a \v character in a quoted field, parsing the file using codecs.getreader() will cause the field to be split across two rows. Someone else has run into the same issue: https://www.enigma.com/blog/the-secret-world-of-newline-characters I'm not sure anything should be done. Perhaps we should do something to reduce that chances that people trip over this issue. E.g. if I want to parse a file containing Unicode text with the CSV module, how do I do it while allowing \v characters (or other new-line like characters other than \n) within fields?

Thank you for the research. The problem is indeed that \v is getting treated as a line separator.  That is an intentional design choice, see:

https://bugs.python.org/issue12855

It would seem to have some surprising implications for CSV parsing.  E.g. if someone embeds a \v character in a quoted field, parsing the file using codecs.getreader() will cause the field to be split across two rows.

Someone else has run into the same issue:

https://www.enigma.com/blog/the-secret-world-of-newline-characters

I'm not sure anything should be done.  Perhaps we should do something to reduce that chances that people trip over this issue.  E.g. if I want to parse a file containing Unicode text with the CSV module, how do I do it while allowing \v characters (or other new-line like characters other than \n) within fields?

History
Date	User	Action	Args
2018-10-04 20:17:40	nascheme	set	recipients: + nascheme, xtreak
2018-10-04 20:17:40	nascheme	set	messageid: <1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za>
2018-10-04 20:17:40	nascheme	link	issue34801 messages
2018-10-04 20:17:40	nascheme	create