Title: Unicode is normalised after keywords are checked for
Type: behavior Stage:
Components: Interpreter Core, Unicode Versions:
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, steven.daprano, vstinner
Priority: normal Keywords:

Created on 2018-05-31 04:53 by steven.daprano, last changed 2018-05-31 04:54 by steven.daprano.

Messages (2)
msg318250 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-05-31 04:53
There is a loophole in the Unicode normalisation which allows the creation of names matching keywords.

class Spam:
    locals()['if'] = 1

Spam.𝐢𝐟    # U+1D422 U+1D41F
# returns 1

Those two characters are 'MATHEMATICAL BOLD SMALL I' and 'MATHEMATICAL BOLD SMALL F'. They ought to be normalised to "if", which is a keyword.

Of course Spam.if is a syntax error, and I believe Spam.𝐢𝐟 ought to be as well.

Another example:

py> globals()['for'] = 2
py> 𝐟or

I also asked about this here:
msg318251 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-05-31 04:54
Possibly the correct term is canonicalisation rather than normalisation, although I think the two are interchangeable.
Date User Action Args
2018-05-31 04:54:23steven.dapranosetmessages: + msg318251
2018-05-31 04:53:08steven.dapranocreate