classification
Title: Normalise non-ASCII variable names in __all__
Type: behavior Stage:
Components: Unicode Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nate Soares, ezio.melotti, mbussonn, mrabarnett, steven.daprano, vstinner
Priority: normal Keywords:

Created on 2017-06-26 18:08 by Nate Soares, last changed 2017-07-05 13:36 by mbussonn.

Messages (6)
msg296928 - (view) Author: Nate Soares (Nate Soares) Date: 2017-06-26 18:08
[NOTE: In this comment, I use BB to mean unicode character 0x1D539, b/c the issue tracker won't let me submit a comment with unicode characters in it.]

Directory structure:

repro/
  foo.py
  test_foo.py

Contents of foo.py:
    BB = 1
    __all__ = ['BB']

Contents of test_foo.py:
    from .foo import *

Error message:
    AttributeError: module 'repro.foo' has no attribute 'BB'

If I change foo.py to have `__all__ = ['B']` (note that 'B' is not the same as 'BB'), then everything works "fine", modulo the fact that now foo.B is a thing and foo.BB is not a thing.

[Recall that in the above, BB is a placeholder for U+1D539, which the issuetracker prevents me from writing here.]
msg296934 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2017-06-26 19:27
I can reproduce the issue:
$ cat foo.py 
𝔹𝔹 = 1
__all__ = ['𝔹𝔹']

$ python3 -c 'import foo; print(dir(foo)); from foo import *'
['BB', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'foo' has no attribute '𝔹𝔹

(Note the ascii 'BB' in the dir(foo))


There's also an easier way to reproduce it:
>>> 𝔹𝔹= 3
>>> 𝔹𝔹
3
>>> BB
3
>>> globals()['BB']
3
>>> globals()['𝔹𝔹']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '𝔹𝔹'
>>> globals()
{'__name__': '__main__', '__spec__': None, '__builtins__': <module 'builtins' (built-in)>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None, 'BB': 3, '__package__': None}
>>> class Foo:
... 𝔹  𝔹= 3
... 
>>> Foo.𝔹𝔹
3
>>> Foo.BB
3

It seems the '𝔹𝔹' gets normalized to 'BB' when it's an identifier, but not when it's a string.  I'm not sure why this happens though.
msg296935 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-06-26 19:49
See PEP 3131 -- Supporting Non-ASCII Identifiers

It says: """All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC."""

>>> import unicodedata
>>> unicodedata.name(unicodedata.normalize('NFKC', '\N{MATHEMATICAL DOUBLE-STRUCK CAPITAL B}'))
'LATIN CAPITAL LETTER B'
msg297284 - (view) Author: Nate Soares (Nate Soares) Date: 2017-06-29 17:03
To be clear, the trouble I was trying to point at is that if foo.py didn't
have __all__, then it would still have a BB attribute. But if the module is
given __all__, the BB is normalized away into a B. This seems like pretty
strange/counterintuitive behavior. For instance, I found this bug when I
added __all__ to a mathy library, where other modules had previously been
happily importing BB and using <module>.BB etc. with no trouble.

In other words, I could accept "BB gets normalized to B always", but the
current behavior is "modules are allowed to have a BB attribute but only if
they don't use __all__, because __all__ requires putting the BB through a
process that normalizes it to B, and which otherwise doesn't get run".

If this is "working as intended" then w/e, I'll work around it, but I want
to make sure that we all understand the inconsistency before letting this
bug die in peace :-)

On Wed, Jun 28, 2017 at 10:55 AM Brett Cannon <report@bugs.python.org>
wrote:

>
> Changes by Brett Cannon <brett@python.org>:
>
>
> ----------
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30772>
> _______________________________________
>
msg297333 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2017-06-30 00:20
I think that the names in __all__ should have the same NFKC normalisation applied as the identifiers.

Re-opening for 3.7.
msg297739 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2017-07-05 13:36
> I think that the names in __all__ should have the same NFKC normalisation applied as the identifiers.
 
Does it make sens to add to this issue : Ensure that all elements of __all__ are str ? (At least emit a warning ?)

I have encounter a small number of libraries where some member of all are the actual objects. Easy mistake to make if you make a public decorator:

    __all__ = []

    def public(o):
        __all__.append(o)
        return o

    @public
    def bar():
        pass

Happy to open a different issue if deemed necessary. Thanks !
History
Date User Action Args
2017-07-05 13:36:35mbussonnsetnosy: + mbussonn
messages: + msg297739
2017-06-30 00:20:16steven.dapranosetstatus: closed -> open

title: If I make an attribute " -> Normalise non-ASCII variable names in __all__
messages: + msg297333
versions: + Python 3.7, - Python 3.6
type: behavior
resolution: not a bug ->
stage: resolved ->
2017-06-29 17:03:40Nate Soaressetmessages: + msg297284
title: If I make an attribute "[a unicode version of B]", it gets assigned to "[ascii B]", and so on. -> If I make an attribute "
2017-06-28 17:55:21brett.cannonsetstatus: open -> closed
resolution: not a bug
stage: resolved
2017-06-27 14:54:40steven.dapranosetnosy: + steven.daprano
2017-06-26 19:49:27mrabarnettsetnosy: + mrabarnett
messages: + msg296935
2017-06-26 19:27:47ezio.melottisetmessages: + msg296934
2017-06-26 18:08:51Nate Soarescreate