classification
Title: Wrong offset on SyntaxError when identifier contains non-ascii characters
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
View: 2382
Assigned To: Nosy List: bmispelon, ezio.melotti, r.david.murray
Priority: normal Keywords:

Created on 2012-10-09 10:23 by bmispelon, last changed 2012-10-10 04:05 by ezio.melotti. This issue is now closed.

Messages (4)
msg172470 - (view) Author: Baptiste Mispelon (bmispelon) * Date: 2012-10-09 10:23
When a syntax error happens, the exception that gets printed has an extra line with a caret that helps locate the error.

If the line also contains an identifier with non-ascii characters, then this caret is misaligned (too far on the right).

I've investigated briefly and it seems that the offset attribute on the SyntaxError has a wrong value:

    for varname in ['a', 'é', '蟒']: # 1, 2 and 3 bytes
        try:
            exec("%s$" % varname) # SyntaxError
        except SyntaxError as e:
            print(e.offset) # should be 2

The example above prints 2, 3, and 4 when it should be printing 2 every time.

It seems that the calculation of the offset takes into account the size in bytes instead of the size in characters.

I've tested and reproduced the issue on 3.2.2 and on a recent clone of the mercurial repository (dd5e98ddcd39).
msg172471 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-09 10:25
See #2382.
msg172488 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-10-09 15:30
Ezio, is there a reason you didn't close this as a duplicate?
msg172551 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-10 04:05
I was just in a hurry and didn't have time to check if they were indeed the same issue.  Looks like they are, so I'm closing this as duplicate.
History
Date User Action Args
2012-10-10 04:05:00ezio.melottisetstatus: open -> closed
versions: + Python 3.3
superseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
messages: + msg172551

resolution: duplicate
stage: resolved
2012-10-09 15:30:01r.david.murraysetnosy: + r.david.murray
messages: + msg172488
2012-10-09 10:25:19ezio.melottisetnosy: + ezio.melotti
messages: + msg172471
2012-10-09 10:23:53bmispeloncreate