classification
Title: [doc] misleading return from isidentifier
Type: behavior Stage: needs patch
Components: Documentation, Unicode Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Jim.Jewett, benjamin.peterson, docs@python, ezio.melotti, mbussonn, pitrou, r.david.murray, serhiy.storchaka
Priority: normal Keywords: easy

Created on 2012-01-19 00:54 by Jim.Jewett, last changed 2021-12-05 23:01 by iritkatriel.

Messages (9)
msg151597 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 00:54
Python identifiers are in NFKC form; string method .isidentifier() returns true on strings that are not in that form.  In some contexts, these non-canonical strings will be replaced with their NFKC equivalent, but in other contexts (such as the builtins hasattr, getattr, delattr) they will not.


>>> cha=chr(170)
>>> cha
'ª'

>>> cha.isidentifier()
True

>>> uc.normalize("NFKC", cha)
'a'

>>> obj.ª = 5
>>> hasattr(obj, "ª")
False
>>> obj.a
5
msg151599 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 00:56
I don't see why that's invalid. "str.isidentifier()" returning True means Python will accept it as an identifier.
msg151600 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 01:05
My preference would be for non_NFKC.isidentifier() to return False, but that may be a problem for backwards compatibility.

It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead.

At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1)
msg151601 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 01:06
2012/1/18 Jim Jewett <report@bugs.python.org>:
>
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> My preference would be for non_NFKC.isidentifier() to return False

It *is* an identifier, though. Python will happily accept it.

>
> It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead.
>
> At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1)

Sounds fine to me.
msg151602 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 01:07
@Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier.  Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't.  And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser.
msg151603 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 01:10
2012/1/18 Jim Jewett <report@bugs.python.org>:
>
> Jim Jewett <jimjjewett@gmail.com> added the comment:
>
> @Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier.  Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't.  And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser.

AFAIK, the only time it will "silently" canonicalize it for you is
parsing. Even if it wasn't, you can't say it's not an identifier, it's
just not normalized.
msg296754 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2017-06-24 05:48
I have been bitten by that as well. I think the doc should mention to verify that the given string is normalized, not that it **should** be normalized.

Agreed that If isidentifier could also possibly grow a `allow_non_nfkc=True` default parameter that would allow to deactivate internal normalisation and return False/Raise on Non NKFC that would be great. 

I'm also interested on having an option on ast.parse or compile to not normalize to at least be able to lint wether users are using non NFKC form, but that's another issue.

I'll see if I can come up with – at least – a documentation patch.
msg297270 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-29 14:21
IMO allow_non_nfkc=True that just returns False would be a bad idea, since as Benjamin points out it *is* a valid identifier, it's just not normalized (yet).  Raising might work, that way you could tell the difference, but that would be a weird API for such a check function.  Regardless, we should probably keep this issue to a doc patch, and open a new issue for any proposed enhancement request.  

And you probably want to discuss it on python-ideas first, since the underlying issue is a bit complex and the solution non-obvious, with possible knock-on effects.  (Or maybe I'm wrong and the consensus will be that returning False with that flag would be fine.)
msg297403 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-30 13:44
See also issue 30772 about the deeper problem.
History
Date User Action Args
2021-12-05 23:01:22iritkatrielsetkeywords: + easy
title: misleading return from isidentifier -> [doc] misleading return from isidentifier
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.4, Python 3.5, Python 3.6
2017-06-30 13:44:23r.david.murraysetmessages: + msg297403
2017-06-29 14:38:49vstinnersetnosy: - vstinner
2017-06-29 14:21:06r.david.murraysetnosy: + r.david.murray
messages: + msg297270
2017-06-24 05:48:27mbussonnsetnosy: + mbussonn
messages: + msg296754
2015-11-16 15:30:02rhettingersetnosy: + pitrou
2015-11-16 14:29:08serhiy.storchakasetnosy: + serhiy.storchaka, docs@python, vstinner
versions: + Python 3.4, Python 3.5, Python 3.6
assignee: docs@python
components: + Documentation
type: behavior
stage: needs patch
2012-01-19 01:10:03benjamin.petersonsetmessages: + msg151603
2012-01-19 01:07:54Jim.Jewettsetmessages: + msg151602
2012-01-19 01:06:30benjamin.petersonsetmessages: + msg151601
2012-01-19 01:05:12Jim.Jewettsetmessages: + msg151600
2012-01-19 00:56:33benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg151599
2012-01-19 00:54:19Jim.Jewettcreate