Issue 33705: Unicode is normalised after keywords are checked for

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77886

classification

Title:	Unicode is normalised after keywords are checked for
Type:	behavior	Stage:
Components:	Interpreter Core, Unicode	Versions:

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, steven.daprano, vstinner
Priority:	normal	Keywords:

Created on 2018-05-31 04:53 by steven.daprano, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg318250 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-05-31 04:53
There is a loophole in the Unicode normalisation which allows the creation of names matching keywords. class Spam: locals()['if'] = 1 Spam.𝐢𝐟 # U+1D422 U+1D41F # returns 1 Those two characters are 'MATHEMATICAL BOLD SMALL I' and 'MATHEMATICAL BOLD SMALL F'. They ought to be normalised to "if", which is a keyword. Of course Spam.if is a syntax error, and I believe Spam.𝐢𝐟 ought to be as well. Another example: py> globals()['for'] = 2 py> 𝐟or 2 I also asked about this here: https://mail.python.org/pipermail/python-dev/2018-May/153619.html
msg318251 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-05-31 04:54
Possibly the correct term is canonicalisation rather than normalisation, although I think the two are interchangeable.