Message327082
Thank you for the research. The problem is indeed that \v is getting treated as a line separator. That is an intentional design choice, see:
https://bugs.python.org/issue12855
It would seem to have some surprising implications for CSV parsing. E.g. if someone embeds a \v character in a quoted field, parsing the file using codecs.getreader() will cause the field to be split across two rows.
Someone else has run into the same issue:
https://www.enigma.com/blog/the-secret-world-of-newline-characters
I'm not sure anything should be done. Perhaps we should do something to reduce that chances that people trip over this issue. E.g. if I want to parse a file containing Unicode text with the CSV module, how do I do it while allowing \v characters (or other new-line like characters other than \n) within fields? |
|
Date |
User |
Action |
Args |
2018-10-04 20:17:40 | nascheme | set | recipients:
+ nascheme, xtreak |
2018-10-04 20:17:40 | nascheme | set | messageid: <1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za> |
2018-10-04 20:17:40 | nascheme | link | issue34801 messages |
2018-10-04 20:17:40 | nascheme | create | |
|