Message 270922 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Rosuav
Recipients	Rosuav, berker.peksag, ncoghlan
Date	2016-07-21.13:28:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1469107721.84.0.394649961699.issue27582@psf.upfronthosting.co.za>
In-reply-to

Content
The question was raised that there might be a problem with (UTF-8) bytes vs characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the line of source and then gets its character length to use as the new offset. So I don't think this is a duplicate of 2382. (Side point: There appears to be quite a bit of complexity inside the CPython parser to cope with the fact that it does everything in UTF-8 bytes rather than simply decoding to text and lexing that. I presume that's for the sake of efficiency - that it'd be too slow to work through PyUnicode everywhere?) Am looking into the rest.

The question was raised that there might be a problem with (UTF-8) bytes vs characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the line of source and then gets its character length to use as the new offset. So I don't think this is a duplicate of 2382.

(Side point: There appears to be quite a bit of complexity inside the CPython parser to cope with the fact that it does everything in UTF-8 bytes rather than simply decoding to text and lexing that. I presume that's for the sake of efficiency - that it'd be too slow to work through PyUnicode everywhere?)

Am looking into the rest.

History
Date	User	Action	Args
2016-07-21 13:28:41	Rosuav	set	recipients: + Rosuav, ncoghlan, berker.peksag
2016-07-21 13:28:41	Rosuav	set	messageid: <1469107721.84.0.394649961699.issue27582@psf.upfronthosting.co.za>
2016-07-21 13:28:41	Rosuav	link	issue27582 messages
2016-07-21 13:28:41	Rosuav	create