classification
Title: SyntaxError should contain exact location of the invalid character in identifier
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: 10382 Superseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
View: 2382
Assigned To: Nosy List: BreamoreBoy, belopolsky, benjamin.peterson, ezio.melotti, haypo, terry.reedy
Priority: normal Keywords:

Created on 2010-11-11 01:32 by belopolsky, last changed 2014-10-01 01:56 by berker.peksag. This issue is now closed.

Messages (6)
msg120936 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-11 01:32
Can you see the error in the following?

>>> inv​alid = 5
  File "<stdin>", line 1
    inv​alid = 5
             ^
SyntaxError: invalid character in identifier

The problem is that an invisible space character crept into the identifier:

>>> repr("inv​alid")
"'inv\\u200balid'"

With full unicode available in most OSes, the potential for errors like this (accidental or as a result of a practical joke) increases.  It would be much easier to spot the offending character if ^ marker pointed at the exact location rather than at the end of the identifier.


See also issue #10382.
msg121059 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-11-12 19:17
I see the marker pointing to the space after '=', which is *really* not helpful. If '5' were instead an identifier, one might be really misdirected. So best would be "Invalid char '0xnnnn' at position n in identifier 'something'"
+1 to any improvement in SyntaxError reports.
msg228006 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-09-30 21:33
#10382 has been closed in favour of #2382.
msg228021 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-09-30 22:47
It looks like the issue was already fixed:

haypo@smithers$ ./python
Python 3.5.0a0 (default:8e9df3414185, Oct  1 2014, 00:19:36) 
>>> inv​alid = 5
  File "<stdin>", line 1
    inv​alid = 5
           ^
SyntaxError: invalid character in identifier

The cursor is now before "=". It's not on the invalid character inside the identifier, but it's better than before ;-)
msg228023 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-09-30 23:00
The issue was not fixed.  With multiple invisible space characters I can get

Python 3.5.0a0 (default:5313b4c0bb6c, Sep 30 2014, 18:55:45)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
>>> invalid = None
  File "<stdin>", line 1
    invalid = None
                ^
SyntaxError: invalid character in identifier
msg228026 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-09-30 23:12
> The issue was not fixed.  With multiple invisible space characters I can get

Ok. So this issue is a duplicate of the issue #2382. IMO the fix is to use wcswidth(), but see the issue for the long discussion :-/
History
Date User Action Args
2014-10-01 01:56:20berker.peksagsetstage: needs patch -> resolved
2014-09-30 23:12:39hayposetsuperseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
resolution: fixed -> duplicate
messages: + msg228026
2014-09-30 23:00:21belopolskysetmessages: + msg228023
2014-09-30 22:47:27hayposetstatus: open -> closed

nosy: + haypo
messages: + msg228021

resolution: fixed
2014-09-30 21:33:44BreamoreBoysetnosy: + BreamoreBoy
messages: + msg228006
2010-11-17 23:52:02pitrousetnosy: + benjamin.peterson
2010-11-12 23:52:29ezio.melottisetnosy: + ezio.melotti
2010-11-12 19:17:42terry.reedysetnosy: + terry.reedy
messages: + msg121059
2010-11-11 01:37:09belopolskysetdependencies: + Command line error marker misplaced on unicode entry
2010-11-11 01:32:53belopolskycreate