This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: SyntaxError should contain exact location of the invalid character in identifier
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: 10382 Superseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
View: 2382
Assigned To: Nosy List: belopolsky, benjamin.peterson, ezio.melotti, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2010-11-11 01:32 by belopolsky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg120936 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-11 01:32
Can you see the error in the following?

>>> inv​alid = 5
  File "<stdin>", line 1
    inv​alid = 5
             ^
SyntaxError: invalid character in identifier

The problem is that an invisible space character crept into the identifier:

>>> repr("inv​alid")
"'inv\\u200balid'"

With full unicode available in most OSes, the potential for errors like this (accidental or as a result of a practical joke) increases.  It would be much easier to spot the offending character if ^ marker pointed at the exact location rather than at the end of the identifier.


See also issue #10382.
msg121059 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-11-12 19:17
I see the marker pointing to the space after '=', which is *really* not helpful. If '5' were instead an identifier, one might be really misdirected. So best would be "Invalid char '0xnnnn' at position n in identifier 'something'"
+1 to any improvement in SyntaxError reports.
msg228006 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-09-30 21:33
#10382 has been closed in favour of #2382.
msg228021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 22:47
It looks like the issue was already fixed:

haypo@smithers$ ./python
Python 3.5.0a0 (default:8e9df3414185, Oct  1 2014, 00:19:36) 
>>> inv​alid = 5
  File "<stdin>", line 1
    inv​alid = 5
           ^
SyntaxError: invalid character in identifier

The cursor is now before "=". It's not on the invalid character inside the identifier, but it's better than before ;-)
msg228023 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-09-30 23:00
The issue was not fixed.  With multiple invisible space characters I can get

Python 3.5.0a0 (default:5313b4c0bb6c, Sep 30 2014, 18:55:45)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
>>> invalid = None
  File "<stdin>", line 1
    invalid = None
                ^
SyntaxError: invalid character in identifier
msg228026 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 23:12
> The issue was not fixed.  With multiple invisible space characters I can get

Ok. So this issue is a duplicate of the issue #2382. IMO the fix is to use wcswidth(), but see the issue for the long discussion :-/
msg323735 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-08-18 21:30
When testing how IDLE handles the examples (which it does well, see #2382 msg323734), I discovered that while the single invisible char for msg120936 *is* present in the posted text, the multiple invisible chars for msg228023 are not.  The following has them before and after the 'a':
  inv​a​lid
History
Date User Action Args
2022-04-11 14:57:08adminsetgithub: 54593
2018-08-18 21:30:39terry.reedysetnosy: - BreamoreBoy
messages: + msg323735
2014-10-01 01:56:20berker.peksagsetstage: needs patch -> resolved
2014-09-30 23:12:39vstinnersetsuperseder: [Py3k] SyntaxError cursor shifted if multibyte character is in line.
resolution: fixed -> duplicate
messages: + msg228026
2014-09-30 23:00:21belopolskysetmessages: + msg228023
2014-09-30 22:47:27vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg228021

resolution: fixed
2014-09-30 21:33:44BreamoreBoysetnosy: + BreamoreBoy
messages: + msg228006
2010-11-17 23:52:02pitrousetnosy: + benjamin.peterson
2010-11-12 23:52:29ezio.melottisetnosy: + ezio.melotti
2010-11-12 19:17:42terry.reedysetnosy: + terry.reedy
messages: + msg121059
2010-11-11 01:37:09belopolskysetdependencies: + Command line error marker misplaced on unicode entry
2010-11-11 01:32:53belopolskycreate