This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: namedtuple raises a SyntaxError instead of ValueError on invalid identifier
Type: behavior Stage:
Components: Library (Lib), Unicode Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Ramchandra Apte, eric.smith, ezio.melotti, flox, lemburg, mark.dickinson, mathieui, pitrou, python-dev, r.david.murray, rhettinger
Priority: high Keywords:

Created on 2013-03-01 20:53 by mathieui, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (9)
msg183291 - (view) Author: Mathieu Pasquet (mathieui) * Date: 2013-03-01 20:53
In py3k, str.isalnum(), str.isdigit(), and str.isdecimal() are broken because they take into account various unicode numbers.

A common case is doing something like that:

num = -1
while num == -1:
    num_in = input('Enter a number> ')
    if num_in.isdigit():
        num = int(num_in)

# do stuff …

If you enter ¹, or any esoteric unicode representation of a number, all the methods referenced above will return True. I believe this is a bug.

It also affects the stdlib, e.g. in collection.namedtuple,
A = namedtuple('A¹', 'x y') will return an ugly Syntax Error, because the sanity check uses str.isalnum(), which says it’s ok. (n.b.: of course, no sane person should ever want to do the above, but I find it worth mentionning)
msg183295 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-01 21:43
I think this is working as designed, and can't be changed (at least not easily) because of backward compatibility even if there are bits of the design that are deemed buggy.  The issue, I believe, is what is considered a number by the Unicode consortium.

See issue 10557 for some background.

Python should at least be consistent about what is treated as a number, though, so the ugly syntax error from namedtuple is probably a bug.
msg183296 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-03-01 21:55
Actually, the character is SUPERSCRIPT ONE, in the category No (Number other).
http://www.fileformat.info/info/unicode/char/b9/index.htm

This is not a valid category for the identifiers, only "Nd" Number decimal is accepted.

The issue is probably in namedtuple, which should check the "unicodedata.category" or better, check with method str.isidentifier.
msg183298 - (view) Author: Mathieu Pasquet (mathieui) * Date: 2013-03-01 22:15
I understand the reasoning behind the feature, and the will to be unicode-compliant, but I think this might still break a lot of code (though it may never be detected).

I understand that isdecimal() is the safe way, because anything that is a decimal (Nd) can be translated to an integer by int() ; however, what is the recommended way to get something that isnumeric() into an int?

unicodedata.normalize('NFKD', num) or unicodedata.normalize('NFKC', num)?

Maybe str could have a method that does this, or methods performing exclusively on ascii values?

Sorry for the noise, I did not find issue 10557 when I searched.
msg183303 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-03-01 23:34
This is consistent:

>>> '¹'.isnumeric(), '¹'.isdigit(), '¹'.isdecimal()
(True, True, False)

>>> unicodedata.numeric('¹')
1.0
>>> unicodedata.digit('¹')
1
>>> unicodedata.decimal('¹')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a decimal
>>> 

Changing the title to focus on the issue with collections.namedtuple.
msg183306 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-02 00:22
namedtuple should simply use isidentifier(), rather than isalnum().
msg183311 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-03-02 05:17
> namedtuple should simply use isidentifier(), rather than isalnum().
+2
msg183312 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-03-02 07:51
New changeset 80bcc98bf939 by Raymond Hettinger in branch '3.3':
Issue #17331:  Use isidentifier() instead of isalnum() to check for valid identifiers.
http://hg.python.org/cpython/rev/80bcc98bf939
msg183333 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-03-02 17:34
Thanks for the suggestion.  isidentifier() was exactly what was needed.
History
Date User Action Args
2022-04-11 14:57:42adminsetgithub: 61533
2013-03-02 17:34:55rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg183333
2013-03-02 07:51:55python-devsetnosy: + python-dev
messages: + msg183312
2013-03-02 06:01:48rhettingersetpriority: normal -> high
assignee: rhettinger
2013-03-02 05:17:23Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg183311
2013-03-02 00:22:03pitrousetnosy: + pitrou
messages: + msg183306
2013-03-01 23:34:46floxsettitle: Fix str methods for detecting digits with unicode -> namedtuple raises a SyntaxError instead of ValueError on invalid identifier
messages: + msg183303
components: + Library (Lib)
versions: + Python 3.4
2013-03-01 22:15:35mathieuisetmessages: + msg183298
2013-03-01 21:55:20floxsetnosy: + flox

messages: + msg183296
versions: + Python 3.2
2013-03-01 21:43:03r.david.murraysetnosy: + rhettinger, eric.smith, r.david.murray, lemburg, mark.dickinson
messages: + msg183295
2013-03-01 20:53:40mathieuicreate