This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: wildcard imports should normalise names in `__all__`
Type: enhancement Stage:
Components: Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, christian.kolen, eric.smith, serhiy.storchaka, steven.daprano
Priority: normal Keywords:

Created on 2021-08-22 20:55 by christian.kolen, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg400106 - (view) Author: Kolen Cheung (christian.kolen) Date: 2021-08-22 20:55
With Python 3.9.6 on macOS,

In a file all_bug.py,

```py
__all__ = ("ϵ",)
ϵ = "ϵ"
```

Then run `from all_bug import *`, resulted in

    AttributeError: module 'all_bug' has no attribute 'ϵ'

This happens with some other unicode characters as well, but not all. I can provide them if needed.

Removing the `__all__` line will successfully import ϵ and be used.
msg400110 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-08-22 23:05
Python normalizes identifiers with NFKC (see PEP 3131):

>>> e0 = "ϵ"
>>> import unicodedata
>>> e1 = unicodedata.normalize("NFKC", e0)
>>> e0 == e1
False
>>> unicodedata.name(e0)
'GREEK LUNATE EPSILON SYMBOL'
>>> unicodedata.name(e1)
'GREEK SMALL LETTER EPSILON'

If you use GREEK SMALL LETTER EPSILON as your identifier, you should be okay.
msg400111 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-08-23 00:13
Eric, I think you may have been too hasty to close this as "not a bug".

It has tripped people up before. See #41542 which has also been closed (I also think prematurely).

We normalise variable names with NFKC, so we should normalise the values in __all__ as well, at least when doing an import.

Note that this is different from #42680 where the right answer is just to document. We shouldn't normalise strings as dict keys.
msg400112 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-08-23 00:44
I'd be okay with changing it to a 3.11 enhancement to normalize elements of __all__, or maybe a request to change the documentation.
msg400143 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-08-23 15:53
Okay, reopening the ticket with a new description.

`from module import *` should use the same NFKC normalisation on the names in `__all__`.

To be clear, I don't propose that `__all__` should be modified.
msg400154 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-08-23 17:42
Should getattr() normalize attribute name? If no, it will produce surprises further along the way. If yes, it will add significant overhead.

Even star-import can get significant impact.
msg400155 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-08-23 17:48
How about making it an error to have non-NFKC normalized names in __all__?
msg400193 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-08-24 02:17
On Mon, Aug 23, 2021 at 05:42:59PM +0000, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
> 
> Should getattr() normalize attribute name? If no, it will produce 
> surprises further along the way. If yes, it will add significant 
> overhead.

Good point.

In pure Python, normalising the string in getattr does have significant 
cost, about 125% slower on my computer using 3.9:

    >>> from functools import partial
    >>> import unicodedata
    >>> normalise = partial(unicodedata.normalize, "NFKC")
    >>> def mygetattr(obj, name, _normalise=normalise, _getattr=getattr):
    ...     return _getattr(obj, _normalise(name))
    ... 
    >>> t1 = Timer('getattr([], "reverse")')
    >>> t2 = Timer('mygetattr([], "reverse")', setup='from __main__ import mygetattr')
    >>> 
    >>> min(t1.repeat(repeat=7))
    0.08972279605222866
    >>> min(t2.repeat(repeat=7))
    0.20272555301198736
    >>> (0.20272555301198736-0.08972279605222866)/0.08972279605222866
    1.2594653971102117

But for ASCII strings at least, I think there is an opportunity to avoid 
that cost entirely. See #44987.

> Even star-import can get significant impact.

I'm less worried about that for three reasons:

1. It only affects star-import, which is not "best practice", so only a 
small number of scripts will be affected.

2. In the most overwhelming common case, you do any star-imports once, 
at the beginning of the module, not repeatedly. Star-imports will not be 
part of a tight, performance critical loop. So it is a one-off cost.

3. The cost of an import is a lot more than just the getattr, so any 
normalisation cost is a correspondingly smaller part of the total.
History
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89144
2021-08-24 02:17:14steven.dapranosetmessages: + msg400193
2021-08-23 17:48:14eric.smithsetmessages: + msg400155
2021-08-23 17:42:59serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg400154
2021-08-23 15:53:21steven.dapranosetstatus: closed -> open

title: `module has no attribute` when `__all__` includes certain unicode characters -> wildcard imports should normalise names in `__all__`
messages: + msg400143
versions: + Python 3.11, - Python 3.9
type: behavior -> enhancement
resolution: not a bug ->
stage: resolved ->
2021-08-23 10:42:35FFY00setnosy: + FFY00
2021-08-23 00:44:56eric.smithsetmessages: + msg400112
2021-08-23 00:13:53steven.dapranosetnosy: + steven.daprano
messages: + msg400111
2021-08-22 23:05:27eric.smithsetstatus: open -> closed

nosy: + eric.smith
messages: + msg400110

resolution: not a bug
stage: resolved
2021-08-22 20:55:35christian.kolencreate