Message 220613 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	taleinat
Recipients	taleinat, terry.reedy
Date	2014-06-15.06:22:45
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1402813366.25.0.700039555815.issue21765@psf.upfronthosting.co.za>
In-reply-to

Content
It seems that the unicodedata module already supplies relevant functions which can be used for this. For example, we can replace "char in self._id_first_chars" with something like: from unicodedata import normalize, category norm_char = normalize(char)[0] is_id_first_char = norm_char_first == '_' or category(norm_char_first) in {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl"} I'm not sure what the "Other_ID_Start property" mentioned in [1] and [2] means, though. Can we get someone with more in-depth knowledge of unicode to help with this? The real question is how to do this fast, since HyperParser does a lot of these checks. Do you think caching would be a good approach? See: .. [1]: https://docs.python.org/3/reference/lexical_analysis.html#identifiers .. [2]: http://legacy.python.org/dev/peps/pep-3131/

It seems that the unicodedata module already supplies relevant functions which can be used for this. For example, we can replace "char in self._id_first_chars" with something like:

from unicodedata import normalize, category
norm_char = normalize(char)[0]
is_id_first_char = norm_char_first == '_' or category(norm_char_first) in {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl"}

I'm not sure what the "Other_ID_Start property" mentioned in [1] and [2] means, though. Can we get someone with more in-depth knowledge of unicode to help with this? 

The real question is how to do this *fast*, since HyperParser does a *lot* of these checks. Do you think caching would be a good approach?

See:
.. [1]: https://docs.python.org/3/reference/lexical_analysis.html#identifiers
.. [2]: http://legacy.python.org/dev/peps/pep-3131/

History
Date	User	Action	Args
2014-06-15 06:22:46	taleinat	set	recipients: + taleinat, terry.reedy
2014-06-15 06:22:46	taleinat	set	messageid: <1402813366.25.0.700039555815.issue21765@psf.upfronthosting.co.za>
2014-06-15 06:22:46	taleinat	link	issue21765 messages
2014-06-15 06:22:45	taleinat	create