Message327096
Attached is a rough patch that tries to fix this problem. I changed the behavior in that unicode char 0x2028 is no longer treated as a line separator. It would be trival to change the regex to support that too, if we want to preserve backwards compatibility. Personally, I think readlines() on a codecs reader should do that same line splitting as an 'io' file.
If we want to use the patch, the following must yet be done: write tests that check the splitting on FS, RS, and GS characters. Write a news entry. I didn't do any profiling to see what the performance effect of my change is so that should be checked too. |
|
Date |
User |
Action |
Args |
2018-10-05 00:20:23 | nascheme | set | recipients:
+ nascheme, lemburg, doerwalter, belopolsky, vstinner, ezio.melotti, r.david.murray, serhiy.storchaka, wpk |
2018-10-05 00:20:23 | nascheme | set | messageid: <1538698823.56.0.545547206417.issue18291@psf.upfronthosting.co.za> |
2018-10-05 00:20:23 | nascheme | link | issue18291 messages |
2018-10-05 00:20:22 | nascheme | create | |
|