This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: module `__all__` cannot detect function name with `φ`
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, sethwoodworth, steven.daprano, tianrluo, vstinner
Priority: normal Keywords:

Created on 2020-08-13 18:46 by tianrluo, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg375330 - (view) Author: Tianrui Luo (tianrluo) Date: 2020-08-13 18:46
Fairly easy to reproduce.

`__all__` in a module seems unable to track name with `φ`.
The following minimal reproducing example also fails for function name of `aφ` or `φb`.

```python3
Python 3.7.7 (default, May  7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tmp import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'tmp' has no attribute 'ϕ'
```

The `tmp.py` file.
```python
# tmp.py

__all__ = ['ϕ']


def ϕ():
    print('"__all__" doesn\'t like me')
    return
```
msg375331 - (view) Author: Seth Woodworth (sethwoodworth) Date: 2020-08-13 19:23
As described in PEP3131, unicode identifiers are normalized via NFKC before being added to the globals, and presumably __all__ as in your example.

You can see what python _did_ add via unicodedata.normalize('NFKC', 'ϕ') which returns 'φ'


[ins] In [8]: bytes('φ', 'utf8')                                                                                                                                
Out[8]: b'\xcf\x86'

[ins] In [9]: bytes('ϕ', 'utf8')                                                                                                                                
Out[9]: b'\xcf\x95'

The normalized version of Phi, I _can_ add to my globals: 
globals()['φ'] = 'foo'
msg375340 - (view) Author: Tianrui Luo (tianrluo) Date: 2020-08-13 20:09
Thanks!

I am closing this bug.

I wonder if it worth prompting warnings when unnormalized unicode characters are used.
msg375344 - (view) Author: Seth Woodworth (sethwoodworth) Date: 2020-08-13 20:35
I don't think it is worth throwing a warning.  This might be the desired, or at least allowed, behavior.  I'm relying on the behavior in a toy library I'm working on.
msg375354 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-08-13 23:45
Hi Seth,

Surely you aren't relying on the behaviour that names in `__all__` *aren't* normalised but others are?

Rather than a warning, I think the right solution here is to normalise the names in `__all__`.
msg375408 - (view) Author: Seth Woodworth (sethwoodworth) Date: 2020-08-14 14:03
@Steven,

I'm exploring what unicode code points can be used as valid starting characters for identifiers.  I'm looping over the code point ranges with the XID_START property and attempting to add them to globals() to see if they maintain the same representation.  Now that I think about it, I could just check if unicode.normalize('NFKC', character) == character before adding it, and I wouldn't see a warning if one existed.
msg375449 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-08-15 05:31
On Fri, Aug 14, 2020 at 02:03:37PM +0000, Seth Woodworth wrote:

> I'm exploring what unicode code points can be used as valid starting 
> characters for identifiers.

I presume you have seen the documention here:

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

> I'm looping over the code point ranges 
> with the XID_START property and attempting to add them to globals() to 
> see if they maintain the same representation.

You can add any hashable key to globals, it's just a dict:

    py> globals()[2] = 'test'
    py> globals()[2]
    'test'

Including strings that differ in their normalization:

    py> import unicodedata
    py> phi = 'ϕ'
    py> nphi = unicodedata.normalize('NFKC', phi)
    py> g = globals()
    py> g[phi] == g[nphi]
    False

The strings are only normalised when used as variables:

    py> eval(f'{phi} == {nphi}')
    True
History
Date User Action Args
2022-04-11 14:59:34adminsetgithub: 85714
2020-08-15 05:31:31steven.dapranosetmessages: + msg375449
2020-08-14 14:03:37sethwoodworthsetmessages: + msg375408
2020-08-13 23:45:12steven.dapranosetnosy: + steven.daprano
messages: + msg375354
2020-08-13 21:00:45vstinnersetresolution: not a bug
2020-08-13 20:35:38sethwoodworthsetmessages: + msg375344
2020-08-13 20:09:46tianrluosetstatus: open -> closed

messages: + msg375340
stage: resolved
2020-08-13 19:23:48sethwoodworthsetnosy: + sethwoodworth
messages: + msg375331
2020-08-13 18:46:53tianrluocreate