Issue 5358: Unicode control characters are not allowed as identifiers

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49608

classification

Title:	Unicode control characters are not allowed as identifiers
Type:	behavior	Stage:
Components:	Unicode	Versions:	Python 3.0, Python 3.1

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	baijum, ezio.melotti, loewis, mrabarnett
Priority:	normal	Keywords:

Created on 2009-02-24 11:53 by baijum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
identifier.py	baijum, 2009-02-24 11:53	File with Unicode control character in identifier

Messages (7)
msg82664 - (view)	Author: Baiju M (baijum)	Date: 2009-02-24 11:53
I tried to use Zero-width joiner (U+200D) as part of an identifier. It produce an exception like this: SyntaxError: invalid character in identifier I have attached the Python file which produce this error. Zero-width joiner (U+200D) is a Unicode control character: http://en.wikipedia.org/wiki/Unicode_control_characters
msg82666 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-02-24 16:21
Why do you think this is a bug?
msg82820 - (view)	Author: Baiju M (baijum)	Date: 2009-02-27 06:47
On a further look at this issue, I understood Python cannot use all Unicode control characters as identifiers. But for many international languages, without some control characters like ZWJ & ZWNJ [1], it won't be possible to construct all characters with proper visual representation. So, if Python really want to support international characters as identifiers (for some reason), ZWJ & ZWNJ are unavoidable, may be some other characters also. [1] http://en.wikipedia.org/wiki/Zero-width_joiner http://en.wikipedia.org/wiki/Zero-width_non-joiner
msg82821 - (view)	Author: Baiju M (baijum)	Date: 2009-02-27 07:24
I think RFC-3454 [1] can be used as a base for selecting the control characters which can be used as a valid identifier character. [1] http://www.rfc-editor.org/rfc/rfc3454.txt
msg82822 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-02-27 07:48
Valid identifiers should begin with a letter or '_' and contain only letters, numbers and '_'. This probably means that only the Unicode characters that belong to the categories Ll, Lu (Letter Lower/Upper case), Nd (Number, Decimal Digit) and Pc (Punctuation, Connector) - and possibly other categories like Lm, Lt, No and Nl - are valid. Some examples: >>> ａ－ｂ = 5 # U+FF0D, Cat: Pd, FULLWIDTH HYPHEN-MINUS SyntaxError: invalid character in identifier >>> a＃ = 5 # U+FF03, Cat: Po, FULLWIDTH NUMBER SIGN SyntaxError: invalid character in identifier >>> a）b = 5 # U+FF09, Cat: Pe, FULLWIDTH RIGHT PARENTHESIS SyntaxError: invalid character in identifier >>> ａ＿ｂ = 5 # U+FF3F, Cat: Pc, FULLWIDTH LOW LINE >>> ａ＿ｂ 5 >>> a﹍b﹎c﹏d = 5 # U+FE4D, U+FE4E, U+FE4F, Cat: Pc >>> a﹍b﹎c﹏d 5
msg82842 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2009-02-27 16:54
The definition of a word in the new re module (actually targetted at Python 2.7) is currently a sequence of L&, N&, M& and Pc. I suppose ideally we want the definitions of a word and an identifier to be basically the same, except that an identifier can't start with N&.
msg82858 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-02-27 18:32
See PEP 3131 for a specification what is an identifier in Python. Closing this as "won't fix".

History
Date	User	Action	Args
2022-04-11 14:56:46	admin	set	github: 49608
2009-02-27 18:32:17	loewis	set	status: open -> closed resolution: wont fix messages: + msg82858
2009-02-27 16:54:59	mrabarnett	set	nosy: + mrabarnett messages: + msg82842
2009-02-27 07:48:19	ezio.melotti	set	messages: + msg82822
2009-02-27 07:24:12	baijum	set	messages: + msg82821
2009-02-27 06:47:51	baijum	set	messages: + msg82820
2009-02-24 17:56:16	ezio.melotti	set	nosy: + ezio.melotti
2009-02-24 16:21:44	loewis	set	nosy: + loewis messages: + msg82666
2009-02-24 11:53:50	baijum	create