This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tokenize does not include Other_ID_Start or Other_ID_Continue in identifier
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Make tokenize recognize Other_ID_Start and Other_ID_Continue chars
View: 24194
Assigned To: Nosy List: Joshua.Landau, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-04-25 01:58 by Joshua.Landau, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg264145 - (view) Author: Joshua Landau (Joshua.Landau) * Date: 2016-04-25 01:58
This is effectively a continuation of https://bugs.python.org/issue9712.

The line in Lib/tokenize.py

    Name = r'\w+'

must be changed to a regular expression that accepts Other_ID_Start at the start and Other_ID_Continue elsewhere. Hence tokenize does not accept '℘·'.


See the reference here:

    https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers

I'm unsure whether unicode normalization (aka the `xid` properties) needs to be dealt with too.


Credit to toriningen from http://stackoverflow.com/a/29586366/1763356.
msg264156 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-25 06:04
This is a duplicate of issue24194. Yes, there is no progress still.
msg264161 - (view) Author: Joshua Landau (Joshua.Landau) * Date: 2016-04-25 08:03
Sorry, I'd stumbled on my old comment on the closed issue and completely forgot about the *last* time I did the same thing.
History
Date User Action Args
2022-04-11 14:58:30adminsetgithub: 71030
2016-04-25 08:03:59Joshua.Landausetmessages: + msg264161
2016-04-25 06:05:00serhiy.storchakasetstatus: open -> closed

superseder: Make tokenize recognize Other_ID_Start and Other_ID_Continue chars

nosy: + serhiy.storchaka
messages: + msg264156
resolution: duplicate
stage: resolved
2016-04-25 01:58:44Joshua.Landaucreate