Message270922
The question was raised that there might be a problem with (UTF-8) bytes vs characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the line of source and then gets its character length to use as the new offset. So I don't think this is a duplicate of 2382.
(Side point: There appears to be quite a bit of complexity inside the CPython parser to cope with the fact that it does everything in UTF-8 bytes rather than simply decoding to text and lexing that. I presume that's for the sake of efficiency - that it'd be too slow to work through PyUnicode everywhere?)
Am looking into the rest. |
|
Date |
User |
Action |
Args |
2016-07-21 13:28:41 | Rosuav | set | recipients:
+ Rosuav, ncoghlan, berker.peksag |
2016-07-21 13:28:41 | Rosuav | set | messageid: <1469107721.84.0.394649961699.issue27582@psf.upfronthosting.co.za> |
2016-07-21 13:28:41 | Rosuav | link | issue27582 messages |
2016-07-21 13:28:41 | Rosuav | create | |
|