Issue 41542: module `__all__` cannot detect function name with `φ`

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/85714

classification

Title:	module `__all__` cannot detect function name with `φ`
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, sethwoodworth, steven.daprano, tianrluo, vstinner
Priority:	normal	Keywords:

Created on 2020-08-13 18:46 by tianrluo, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg375330 - (view)	Author: Tianrui Luo (tianrluo)	Date: 2020-08-13 18:46
Fairly easy to reproduce. `__all__` in a module seems unable to track name with `φ`. The following minimal reproducing example also fails for function name of `aφ` or `φb`. ```python3 Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> from tmp import * Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'tmp' has no attribute 'ϕ' ``` The `tmp.py` file. ```python # tmp.py __all__ = ['ϕ'] def ϕ(): print('"__all__" doesn\'t like me') return ```
msg375331 - (view)	Author: Seth Woodworth (sethwoodworth)	Date: 2020-08-13 19:23
As described in PEP3131, unicode identifiers are normalized via NFKC before being added to the globals, and presumably __all__ as in your example. You can see what python _did_ add via unicodedata.normalize('NFKC', 'ϕ') which returns 'φ' [ins] In [8]: bytes('φ', 'utf8') Out[8]: b'\xcf\x86' [ins] In [9]: bytes('ϕ', 'utf8') Out[9]: b'\xcf\x95' The normalized version of Phi, I _can_ add to my globals: globals()['φ'] = 'foo'
msg375340 - (view)	Author: Tianrui Luo (tianrluo)	Date: 2020-08-13 20:09
Thanks! I am closing this bug. I wonder if it worth prompting warnings when unnormalized unicode characters are used.
msg375344 - (view)	Author: Seth Woodworth (sethwoodworth)	Date: 2020-08-13 20:35
I don't think it is worth throwing a warning. This might be the desired, or at least allowed, behavior. I'm relying on the behavior in a toy library I'm working on.
msg375354 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-08-13 23:45
Hi Seth, Surely you aren't relying on the behaviour that names in `__all__` aren't normalised but others are? Rather than a warning, I think the right solution here is to normalise the names in `__all__`.
msg375408 - (view)	Author: Seth Woodworth (sethwoodworth)	Date: 2020-08-14 14:03
@Steven, I'm exploring what unicode code points can be used as valid starting characters for identifiers. I'm looping over the code point ranges with the XID_START property and attempting to add them to globals() to see if they maintain the same representation. Now that I think about it, I could just check if unicode.normalize('NFKC', character) == character before adding it, and I wouldn't see a warning if one existed.
msg375449 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2020-08-15 05:31
On Fri, Aug 14, 2020 at 02:03:37PM +0000, Seth Woodworth wrote: > I'm exploring what unicode code points can be used as valid starting > characters for identifiers. I presume you have seen the documention here: https://docs.python.org/3/reference/lexical_analysis.html#identifiers > I'm looping over the code point ranges > with the XID_START property and attempting to add them to globals() to > see if they maintain the same representation. You can add any hashable key to globals, it's just a dict: py> globals()[2] = 'test' py> globals()[2] 'test' Including strings that differ in their normalization: py> import unicodedata py> phi = 'ϕ' py> nphi = unicodedata.normalize('NFKC', phi) py> g = globals() py> g[phi] == g[nphi] False The strings are only normalised when used as variables: py> eval(f'{phi} == {nphi}') True

History
Date	User	Action	Args
2022-04-11 14:59:34	admin	set	github: 85714
2020-08-15 05:31:31	steven.daprano	set	messages: + msg375449
2020-08-14 14:03:37	sethwoodworth	set	messages: + msg375408
2020-08-13 23:45:12	steven.daprano	set	nosy: + steven.daprano messages: + msg375354
2020-08-13 21:00:45	vstinner	set	resolution: not a bug
2020-08-13 20:35:38	sethwoodworth	set	messages: + msg375344
2020-08-13 20:09:46	tianrluo	set	status: open -> closed messages: + msg375340 stage: resolved
2020-08-13 19:23:48	sethwoodworth	set	nosy: + sethwoodworth messages: + msg375331
2020-08-13 18:46:53	tianrluo	create