Issue 44981: wildcard imports should normalise names in `__all__`

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/89144

classification

Title:	wildcard imports should normalise names in `__all__`
Type:	enhancement	Stage:
Components:		Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	FFY00, christian.kolen, eric.smith, serhiy.storchaka, steven.daprano
Priority:	normal	Keywords:

Created on 2021-08-22 20:55 by christian.kolen, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg400106 - (view)	Author: Kolen Cheung (christian.kolen)	Date: 2021-08-22 20:55
With Python 3.9.6 on macOS, In a file all_bug.py, ```py __all__ = ("ϵ",) ϵ = "ϵ" ``` Then run `from all_bug import *`, resulted in AttributeError: module 'all_bug' has no attribute 'ϵ' This happens with some other unicode characters as well, but not all. I can provide them if needed. Removing the `__all__` line will successfully import ϵ and be used.
msg400110 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2021-08-22 23:05
Python normalizes identifiers with NFKC (see PEP 3131): >>> e0 = "ϵ" >>> import unicodedata >>> e1 = unicodedata.normalize("NFKC", e0) >>> e0 == e1 False >>> unicodedata.name(e0) 'GREEK LUNATE EPSILON SYMBOL' >>> unicodedata.name(e1) 'GREEK SMALL LETTER EPSILON' If you use GREEK SMALL LETTER EPSILON as your identifier, you should be okay.
msg400111 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2021-08-23 00:13
Eric, I think you may have been too hasty to close this as "not a bug". It has tripped people up before. See #41542 which has also been closed (I also think prematurely). We normalise variable names with NFKC, so we should normalise the values in __all__ as well, at least when doing an import. Note that this is different from #42680 where the right answer is just to document. We shouldn't normalise strings as dict keys.
msg400112 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2021-08-23 00:44
I'd be okay with changing it to a 3.11 enhancement to normalize elements of __all__, or maybe a request to change the documentation.
msg400143 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2021-08-23 15:53
Okay, reopening the ticket with a new description. `from module import *` should use the same NFKC normalisation on the names in `__all__`. To be clear, I don't propose that `__all__` should be modified.
msg400154 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-08-23 17:42
Should getattr() normalize attribute name? If no, it will produce surprises further along the way. If yes, it will add significant overhead. Even star-import can get significant impact.
msg400155 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2021-08-23 17:48
How about making it an error to have non-NFKC normalized names in __all__?
msg400193 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2021-08-24 02:17
On Mon, Aug 23, 2021 at 05:42:59PM +0000, Serhiy Storchaka wrote: > > Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: > > Should getattr() normalize attribute name? If no, it will produce > surprises further along the way. If yes, it will add significant > overhead. Good point. In pure Python, normalising the string in getattr does have significant cost, about 125% slower on my computer using 3.9: >>> from functools import partial >>> import unicodedata >>> normalise = partial(unicodedata.normalize, "NFKC") >>> def mygetattr(obj, name, _normalise=normalise, _getattr=getattr): ... return _getattr(obj, _normalise(name)) ... >>> t1 = Timer('getattr([], "reverse")') >>> t2 = Timer('mygetattr([], "reverse")', setup='from __main__ import mygetattr') >>> >>> min(t1.repeat(repeat=7)) 0.08972279605222866 >>> min(t2.repeat(repeat=7)) 0.20272555301198736 >>> (0.20272555301198736-0.08972279605222866)/0.08972279605222866 1.2594653971102117 But for ASCII strings at least, I think there is an opportunity to avoid that cost entirely. See #44987. > Even star-import can get significant impact. I'm less worried about that for three reasons: 1. It only affects star-import, which is not "best practice", so only a small number of scripts will be affected. 2. In the most overwhelming common case, you do any star-imports once, at the beginning of the module, not repeatedly. Star-imports will not be part of a tight, performance critical loop. So it is a one-off cost. 3. The cost of an import is a lot more than just the getattr, so any normalisation cost is a correspondingly smaller part of the total.

History
Date	User	Action	Args
2022-04-11 14:59:49	admin	set	github: 89144
2021-08-24 02:17:14	steven.daprano	set	messages: + msg400193
2021-08-23 17:48:14	eric.smith	set	messages: + msg400155
2021-08-23 17:42:59	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg400154
2021-08-23 15:53:21	steven.daprano	set	status: closed -> open title: `module has no attribute` when `__all__` includes certain unicode characters -> wildcard imports should normalise names in `__all__` messages: + msg400143 versions: + Python 3.11, - Python 3.9 type: behavior -> enhancement resolution: not a bug -> stage: resolved ->
2021-08-23 10:42:35	FFY00	set	nosy: + FFY00
2021-08-23 00:44:56	eric.smith	set	messages: + msg400112
2021-08-23 00:13:53	steven.daprano	set	nosy: + steven.daprano messages: + msg400111
2021-08-22 23:05:27	eric.smith	set	status: open -> closed nosy: + eric.smith messages: + msg400110 resolution: not a bug stage: resolved
2021-08-22 20:55:35	christian.kolen	create