Message 327096 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nascheme
Recipients	belopolsky, doerwalter, ezio.melotti, lemburg, nascheme, r.david.murray, serhiy.storchaka, vstinner, wpk
Date	2018-10-05.00:20:20
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538698823.56.0.545547206417.issue18291@psf.upfronthosting.co.za>
In-reply-to

Content
Attached is a rough patch that tries to fix this problem. I changed the behavior in that unicode char 0x2028 is no longer treated as a line separator. It would be trival to change the regex to support that too, if we want to preserve backwards compatibility. Personally, I think readlines() on a codecs reader should do that same line splitting as an 'io' file. If we want to use the patch, the following must yet be done: write tests that check the splitting on FS, RS, and GS characters. Write a news entry. I didn't do any profiling to see what the performance effect of my change is so that should be checked too.

Attached is a rough patch that tries to fix this problem.  I changed the behavior in that unicode char 0x2028 is no longer treated as a line separator.  It would be trival to change the regex to support that too, if we want to preserve backwards compatibility.  Personally, I think readlines() on a codecs reader should do that same line splitting as an 'io' file.

If we want to use the patch, the following must yet be done: write tests that check the splitting on FS, RS, and GS characters.  Write a news entry.  I didn't do any profiling to see what the performance effect of my change is so that should be checked too.

History
Date	User	Action	Args
2018-10-05 00:20:23	nascheme	set	recipients: + nascheme, lemburg, doerwalter, belopolsky, vstinner, ezio.melotti, r.david.murray, serhiy.storchaka, wpk
2018-10-05 00:20:23	nascheme	set	messageid: <1538698823.56.0.545547206417.issue18291@psf.upfronthosting.co.za>
2018-10-05 00:20:23	nascheme	link	issue18291 messages
2018-10-05 00:20:22	nascheme	create