Author nascheme
Recipients nascheme, xtreak
Date 2018-10-04.20:17:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za>
In-reply-to
Content
Thank you for the research. The problem is indeed that \v is getting treated as a line separator.  That is an intentional design choice, see:

https://bugs.python.org/issue12855

It would seem to have some surprising implications for CSV parsing.  E.g. if someone embeds a \v character in a quoted field, parsing the file using codecs.getreader() will cause the field to be split across two rows.

Someone else has run into the same issue:

https://www.enigma.com/blog/the-secret-world-of-newline-characters

I'm not sure anything should be done.  Perhaps we should do something to reduce that chances that people trip over this issue.  E.g. if I want to parse a file containing Unicode text with the CSV module, how do I do it while allowing \v characters (or other new-line like characters other than \n) within fields?
History
Date User Action Args
2018-10-04 20:17:40naschemesetrecipients: + nascheme, xtreak
2018-10-04 20:17:40naschemesetmessageid: <1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za>
2018-10-04 20:17:40naschemelinkissue34801 messages
2018-10-04 20:17:40naschemecreate