Author Rosuav
Recipients Rosuav, berker.peksag, ncoghlan
Date 2016-07-21.13:28:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1469107721.84.0.394649961699.issue27582@psf.upfronthosting.co.za>
In-reply-to
Content
The question was raised that there might be a problem with (UTF-8) bytes vs characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the line of source and then gets its character length to use as the new offset. So I don't think this is a duplicate of 2382.

(Side point: There appears to be quite a bit of complexity inside the CPython parser to cope with the fact that it does everything in UTF-8 bytes rather than simply decoding to text and lexing that. I presume that's for the sake of efficiency - that it'd be too slow to work through PyUnicode everywhere?)

Am looking into the rest.
History
Date User Action Args
2016-07-21 13:28:41Rosuavsetrecipients: + Rosuav, ncoghlan, berker.peksag
2016-07-21 13:28:41Rosuavsetmessageid: <1469107721.84.0.394649961699.issue27582@psf.upfronthosting.co.za>
2016-07-21 13:28:41Rosuavlinkissue27582 messages
2016-07-21 13:28:41Rosuavcreate