This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Unicode-mangled names refer inconsistently to constants
Type: behavior Stage:
Components: Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Carl.Friedrich.Bolz, Kodiologist, SnoopJeDi, eryksun, jack1142, serhiy.storchaka
Priority: normal Keywords:

Created on 2022-01-27 21:57 by Kodiologist, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg411930 - (view) Author: (Kodiologist) * Date: 2022-01-27 21:57
I'm not sure if this is a bug, but it certainly surprised me. Most reserved words, when Unicode-mangled, as in "𝕕𝕖𝕗", act like ordinary identifiers (see e.g. bpo-46520). `True`, `False`, and `None` are weird in that Unicode-mangled versions of them refer to those same constants initially, but can take on their own identity as variables if assigned to:

    Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
    [GCC 11.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 𝕋𝕣𝕦𝕖
    >>> True = 0
      File "<stdin>", line 1
        True = 0
    SyntaxError: cannot assign to True
    >>> 𝕋𝕣𝕦𝕖 = 0
    >>> True
    >>> 𝕋𝕣𝕦𝕖

I think that `𝕋𝕣𝕦𝕖 = 1` should probably be forbidden. The fact that `𝕋𝕣𝕦𝕖` doesn't always mean the same thing as `True` seems to break the rule in PEP 3131 that "comparison of identifiers is based on NFKC".
msg412070 - (view) Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * Date: 2022-01-29 11:42
hah, this is "great":

>>> 𝕋𝕣𝕦𝕖 = 1
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'True': 1}

The problem is that the lexer assumes that anything that is not ASCII cannot be a keyword and lexes 𝕋𝕣𝕦𝕖 as an identifier.
msg412071 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-29 11:53
True is a keyword which is compiled to expression whose value is True, 𝕋𝕣𝕦𝕖 is an identifier which refers to the builtin variable "True" which has a value True by default. You can change the value of a builtin variable, but the value of expression True is always True.

I do not see a problem here. Don't use 𝕋𝕣𝕦𝕖 if your intention is not using a variable.
msg412150 - (view) Author: (Kodiologist) * Date: 2022-01-30 14:47
> the builtin variable "True"

Is the existence of this entity, as separate from the constant `True`, documented anywhere? constants.rst doesn't seem to acknowledge it. Indeed, is its existence a feature, or is it a CPython quirk?
msg412167 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-30 18:15
msg412169 - (view) Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * Date: 2022-01-30 18:58
Ok, I can definitely agree with Serhiy pov: "True" is a keyword that always evaluates to the object that you get when you call bool(1). There is usually no name "True" and directly assigning to it is forbidden. But there are various other ways to assign a name "True". One is eg globals("True") = 5, another one (discussed in this issue) is using identifiers that NFKC-normalize to the string "True".
msg412170 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-01-30 19:09
Why was it decided to not raise a syntax error when the NFKC normalization of a non-ASCII token matches a keyword? I don't see a use for cases such as `𝕚𝕗 = 1` and `𝕚𝕗 + 1`. It seems the cost in terms of confusion far outweighs any potential benefit.
msg412226 - (view) Author: James Gerity (SnoopJeDi) Date: 2022-02-01 00:41
> Why was it decided to not raise a syntax error...

I'm not sure if such a decision was even ever made, the error happens before normalization is applied. I.e. the parser is doing two things here: (1) validating the syntax against the grammar and (2) building the AST. Normalization happens after (1), and `𝕋𝕣𝕦𝕖 = 0` is valid syntax because the grammar is NOT defined in terms of normalized identifiers, it's describing the valid (but confusing!) assignment that Carl described.

I agree that this doesn't seem like bug, but it IS my new favorite quirk of identifier normalization.
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90713
2022-02-01 00:41:18SnoopJeDisetmessages: + msg412226
2022-01-30 19:09:28eryksunsetnosy: + eryksun
messages: + msg412170
2022-01-30 18:58:53Carl.Friedrich.Bolzsetmessages: + msg412169
2022-01-30 18:15:52serhiy.storchakasetmessages: + msg412167
2022-01-30 14:47:35Kodiologistsetmessages: + msg412150
2022-01-29 17:56:53jack1142setnosy: + jack1142
2022-01-29 11:53:33serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg412071
2022-01-29 11:42:21Carl.Friedrich.Bolzsetnosy: + Carl.Friedrich.Bolz
messages: + msg412070
2022-01-29 03:39:07SnoopJeDisetnosy: + SnoopJeDi
2022-01-27 21:57:22Kodiologistcreate