This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 3 gives misleading errors when validating unicode identifiers
Type: enhancement Stage: patch review
Components: Unicode Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Matt.Bachmann, ezio.melotti, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-01-18 05:44 by Matt.Bachmann, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
clarify_unicode_identifier_errors.patch Matt.Bachmann, 2015-01-18 05:44 Patch to provide a different error when a character is simply an invalid start rather than an entirely invalid one review
Messages (5)
msg234222 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2015-01-18 05:44
PEP 3131 changed the definition of valid identifiers to match this pattern

<XID_Start> <XID_Continue>* .

Currently if you have an invalid character in an identifier you get this error

☺ = 4
SyntaxError: invalid character in identifier


This is fine in most cases. But in some cases the problem is not the character is invalid so much as the character may not be used to START the identifier. One example of this is the "combining grave accent" which is an XID_CONTINUE character but not an XID_START

So ̀e is an invalid identifier but è is a valid identifier. So the ̀ character is not invalid in all cases.

The attached patch attempts to clarify this by providing a different error when the start character is invalid.

>>> ̀e = 4
  File "<stdin>", line 1
    ̀e = 4
     ^
SyntaxError: invalid start character in identifier

However, if the character is simply not allowed (as it is neither an XID_START or an XID_CONTINUE character) the original error is used.
>>> ☺smile = 4
  File "<stdin>", line 1
    ☺smile = 4
         ^
SyntaxError: invalid character in identifier
msg237027 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2015-03-02 08:12
While the request is reasonable, the patch seems to touch quite some code.
Since this is just to improve an error message in a somewhat obscure corner case, I'm not sure it's worth applying it.
msg237030 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-02 08:34
Agreed with Ezio. Adding 7 new public names just to enhance one rare error message looks too hight cost. I am inclined to left all as is. Original message is not so bad.
msg237042 - (view) Author: Matt Bachmann (Matt.Bachmann) * Date: 2015-03-02 13:11
Alrighty. I'll investigate and see if I can cut down the code some. If I can't significantly I'll let the issue die quietly. I agree that it's a pretty nitpick ticket. 

I noticed it while doing some research into unicode and made the patch when I saw how languages like swift handle this case. 

Thanks for looking at it though!
msg237043 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-02 13:39
> The attached patch attempts to clarify this by providing a different error when the start character is invalid.

I dislike the patch. The error message "invalid character in identifier" is correct. I don't want to modify so much code for a little better error message.

If you start to use non-ASCII identifier, you are probably already aware that you may get some issues. I close the issue.
History
Date User Action Args
2022-04-11 14:58:12adminsetgithub: 67452
2015-03-02 13:39:45vstinnersetstatus: open -> closed
resolution: wont fix
messages: + msg237043
2015-03-02 13:11:44Matt.Bachmannsetmessages: + msg237042
2015-03-02 08:34:51serhiy.storchakasetmessages: + msg237030
2015-03-02 08:12:39ezio.melottisetversions: - Python 3.2, Python 3.3, Python 3.4, Python 3.6
nosy: + serhiy.storchaka

messages: + msg237027

stage: patch review
2015-01-18 05:44:41Matt.Bachmanncreate