Issue 30772: Normalise non-ASCII variable names in __all__

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74955

classification

Title:	Normalise non-ASCII variable names in __all__
Type:	behavior	Stage:
Components:	Unicode	Versions:	Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Nate Soares, ezio.melotti, mbussonn, mrabarnett, steven.daprano, vstinner
Priority:	normal	Keywords:

Created on 2017-06-26 18:08 by Nate Soares, last changed 2022-04-11 14:58 by admin.

Messages (6)
msg296928 - (view)	Author: Nate Soares (Nate Soares)	Date: 2017-06-26 18:08
[NOTE: In this comment, I use BB to mean unicode character 0x1D539, b/c the issue tracker won't let me submit a comment with unicode characters in it.] Directory structure: repro/ foo.py test_foo.py Contents of foo.py: BB = 1 __all__ = ['BB'] Contents of test_foo.py: from .foo import * Error message: AttributeError: module 'repro.foo' has no attribute 'BB' If I change foo.py to have `__all__ = ['B']` (note that 'B' is not the same as 'BB'), then everything works "fine", modulo the fact that now foo.B is a thing and foo.BB is not a thing. [Recall that in the above, BB is a placeholder for U+1D539, which the issuetracker prevents me from writing here.]
msg296934 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2017-06-26 19:27
I can reproduce the issue: $ cat foo.py 𝔹𝔹 = 1 __all__ = ['𝔹𝔹'] $ python3 -c 'import foo; print(dir(foo)); from foo import *' ['BB', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__'] Traceback (most recent call last): File "<string>", line 1, in <module> AttributeError: module 'foo' has no attribute '𝔹𝔹 (Note the ascii 'BB' in the dir(foo)) There's also an easier way to reproduce it: >>> 𝔹𝔹= 3 >>> 𝔹𝔹 3 >>> BB 3 >>> globals()['BB'] 3 >>> globals()['𝔹𝔹'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: '𝔹𝔹' >>> globals() {'__name__': '__main__', '__spec__': None, '__builtins__': <module 'builtins' (built-in)>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, 'BB': 3, '__package__': None} >>> class Foo: ... 𝔹 𝔹= 3 ... >>> Foo.𝔹𝔹 3 >>> Foo.BB 3 It seems the '𝔹𝔹' gets normalized to 'BB' when it's an identifier, but not when it's a string. I'm not sure why this happens though.
msg296935 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2017-06-26 19:49
See PEP 3131 -- Supporting Non-ASCII Identifiers It says: """All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.""" >>> import unicodedata >>> unicodedata.name(unicodedata.normalize('NFKC', '\N{MATHEMATICAL DOUBLE-STRUCK CAPITAL B}')) 'LATIN CAPITAL LETTER B'
msg297284 - (view)	Author: Nate Soares (Nate Soares)	Date: 2017-06-29 17:03
To be clear, the trouble I was trying to point at is that if foo.py didn't have __all__, then it would still have a BB attribute. But if the module is given __all__, the BB is normalized away into a B. This seems like pretty strange/counterintuitive behavior. For instance, I found this bug when I added __all__ to a mathy library, where other modules had previously been happily importing BB and using <module>.BB etc. with no trouble. In other words, I could accept "BB gets normalized to B always", but the current behavior is "modules are allowed to have a BB attribute but only if they don't use __all__, because __all__ requires putting the BB through a process that normalizes it to B, and which otherwise doesn't get run". If this is "working as intended" then w/e, I'll work around it, but I want to make sure that we all understand the inconsistency before letting this bug die in peace :-) On Wed, Jun 28, 2017 at 10:55 AM Brett Cannon <report@bugs.python.org> wrote: > > Changes by Brett Cannon <brett@python.org>: > > > ---------- > resolution: -> not a bug > stage: -> resolved > status: open -> closed > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue30772> > _______________________________________ >
msg297333 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-06-30 00:20
I think that the names in __all__ should have the same NFKC normalisation applied as the identifiers. Re-opening for 3.7.
msg297739 - (view)	Author: Matthias Bussonnier (mbussonn) *	Date: 2017-07-05 13:36
> I think that the names in __all__ should have the same NFKC normalisation applied as the identifiers. Does it make sens to add to this issue : Ensure that all elements of __all__ are str ? (At least emit a warning ?) I have encounter a small number of libraries where some member of all are the actual objects. Easy mistake to make if you make a public decorator: __all__ = [] def public(o): __all__.append(o) return o @public def bar(): pass Happy to open a different issue if deemed necessary. Thanks !

History
Date	User	Action	Args
2022-04-11 14:58:48	admin	set	github: 74955
2017-07-05 13:36:35	mbussonn	set	nosy: + mbussonn messages: + msg297739
2017-06-30 00:20:16	steven.daprano	set	status: closed -> open title: If I make an attribute " -> Normalise non-ASCII variable names in __all__ messages: + msg297333 versions: + Python 3.7, - Python 3.6 type: behavior resolution: not a bug -> stage: resolved ->
2017-06-29 17:03:40	Nate Soares	set	messages: + msg297284 title: If I make an attribute "[a unicode version of B]", it gets assigned to "[ascii B]", and so on. -> If I make an attribute "
2017-06-28 17:55:21	brett.cannon	set	status: open -> closed resolution: not a bug stage: resolved
2017-06-27 14:54:40	steven.daprano	set	nosy: + steven.daprano
2017-06-26 19:49:27	mrabarnett	set	nosy: + mrabarnett messages: + msg296935
2017-06-26 19:27:47	ezio.melotti	set	messages: + msg296934
2017-06-26 18:08:51	Nate Soares	create