This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Misleading reflective behaviour due to PEP 3131 NFKC identifiers normalization.
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: hasattr, delattr, getattr fail with unnormalized names
View: 13793
Assigned To: Nosy List: Iago-lito -, benjamin.peterson, ezio.melotti, vstinner
Priority: normal Keywords:

Created on 2018-01-02 16:07 by Iago-lito -, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg309382 - (view) Author: Iago-lito - (Iago-lito -) Date: 2018-01-02 16:07
Consistent with [PEP 3131](https://www.python.org/dev/peps/pep-3131/) and NFKC normalization of identifiers, these two last lines yield an error, since `𝜏` (U+1D70F) is automatically converted to `τ` (U+03C4).

    class Base(object):
        def __init__(self):
            self.𝜏 = 5 # defined with U+1D70F

    a = Base()
    print(a.𝜏)     # 5             # (U+1D70F) expected and intuitive
    print(a.τ)     # 5 as well     # (U+03C4)  normalized version, okay.
    d = a.__dict__ # {'τ':  5}     # (U+03C4)  still normalized version
    print(d['τ'])  # 5             # (U+03C4)  consistent with normalization
    assert hasattr(a, 'τ')         # (U+03C4)  consistent with normalization
    # But if I want to retrieve it the way I entered it because I can type (U+1D70F)
    print(d['𝜏'])  # KeyError: '𝜏' # (U+1D70F) counterintuitive
    assert hasattr(a, '𝜏') # Fails # (U+1D70F) counterintuitive

I've described and undestood the problem in [this post](https://stackoverflow.com/questions/48063082/).

Nothing is unconsistent here. However, I am worried that:

- this behaviour might be counterintuitive and misleading, especially if it occurs that the character user can easily enter for some reason (e.g. U+1D70F) is not equivalent to its NFKC normalization (e.g. U+03C4)

- this behaviours makes it more difficult to enjoy python's reflective `__dict__`, `hasattr` and `getattr` features in this particular case.

Maybe it is user's responsibility to be aware of this limitation, and to keep considering utf-8 coding a bad practice. In this case, maybe this particular reflective limitation could be made explicit in PEP 3131.

Or maybe it is python's responsibility to ensure intuitive and consistent behaviour even in tricky-unicode-cases. So reflective features like `__dict__.__getitem__`, `hasattr` or `getattr` would NFKC-convert their arguments before searching just like `a.𝜏` does, so that:

    getattr(a, '𝜏') is gettatr(a, 'τ')

always yields True.

I actually have no idea of the philosophy to stick to. And the only purpose of this post is to inform the community about this particular, low-priority case.

Thank you for supporting Python anyway, cheers for your patience.. and happy 2018 to everyone :)


--
Iago-lito
msg309383 - (view) Author: Iago-lito - (Iago-lito -) Date: 2018-01-02 16:23
I just found out about [this](https://bugs.python.org/issue13793) very close issue. Much of the philosophy has been made very clear there.

Since the solution to issue13793 is to *document* much this NFKC normalization. Then I think I'd be a good thing to make an explicit statement about these particular reflective limitations in PEP 3131 :)
msg309389 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-01-02 17:52
We don't generally update finalized PEPs. The official documentation for a feature is in the Python docs. Feel free to propose a PR if you think it could be improved.
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76664
2018-01-02 17:52:57benjamin.petersonsetstatus: open -> closed

superseder: hasattr, delattr, getattr fail with unnormalized names

nosy: + benjamin.peterson
messages: + msg309389
resolution: duplicate
stage: resolved
2018-01-02 16:23:28Iago-lito -setmessages: + msg309383
2018-01-02 16:07:22Iago-lito -create