This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Parse out invisible Unicode characters?
Type: behavior Stage:
Components: Unicode Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, leewz, mrabarnett, r.david.murray, v+python, vstinner
Priority: normal Keywords:

Created on 2018-03-02 06:04 by leewz, last changed 2022-04-11 14:58 by admin.

Messages (4)
msg313127 - (view) Author: Franklin? Lee (leewz) Date: 2018-03-02 06:04
The following line should have a character that trips up the compiler.
  ‎indices = range(5)

The character is \u200e, and was inserted by Google Keep. (I've already reported the issue to Google as a regression.)

Here's the error message:
"""
  File "<stdin>", line 3
    ‎indices = range(5)
           ^
SyntaxError: invalid character in identifier
"""

Depending on the terminal or editor, it may not be possible to tell the problem just from looking. Without knowledge/experience of Unicode, it may not be possible to figure out the problem at all.

Since Python source now uses Unicode by default, should certain invisible characters be stripped out during compilation?
msg313155 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-03-02 19:10
For the record, '\u200e' is '\N{LEFT-TO-RIGHT MARK}'.
msg313159 - (view) Author: Glenn Linderman (v+python) * Date: 2018-03-02 19:46
Characters should not be stripped during compilation. But I can see where it might be helpful if the codepoint of the character, and the printed form just in case it is printable, could helpfully be included in the error message, as well as having the ^ pointer pointing to it.
msg313629 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-03-12 00:42
I think it sounds like a good idea to put the printed representation as a repered string, followed by the code point representation in parenthesis, in that message after "invalid character".
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77163
2018-03-12 00:42:44r.david.murraysetnosy: + r.david.murray
messages: + msg313629
2018-03-02 19:46:14v+pythonsetnosy: + v+python
messages: + msg313159
2018-03-02 19:10:57mrabarnettsetnosy: + mrabarnett
messages: + msg313155
2018-03-02 06:04:50leewzcreate