This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vajrasky
Recipients akuchling, docs@python, vajrasky
Date 2013-08-19.07:45:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1376898316.85.0.0813282739116.issue18779@psf.upfronthosting.co.za>
In-reply-to
Content
According to:

http://oald8.oxfordlearnersdictionaries.com/dictionary/alphanumeric
http://en.wikipedia.org/wiki/Alphanumeric

Alphanumeric is defined as [A-Za-z0-9]. Underscore (_) is not one of them. One of the documentation in Python (Doc/tutorial/stdlib2.rst) differentiates them very clearly:

"The format uses placeholder names formed by ``$`` with valid Python identifiers
(alphanumeric characters and underscores).  Surrounding the placeholder with
braces allows it to be followed by more alphanumeric letters with no intervening
spaces.  Writing ``$$`` creates a single escaped ``$``::"

Yet, in documentations as well as comments in regex, we implicitely assumes underscore belongs to alphanumeric.

Explicit is better than implicit!

Attached the patch to differentiate alphanumeric and underscore in documentations and comments in regex.

This is important in case someone is confused with this code:
>>> import re
>>> re.split('\W', 'haha$hihi*huhu_hehe hoho')
['haha', 'hihi', 'huhu_hehe', 'hoho']

On the side note:
In Python code base, sometimes we write "alphanumerics" and "underscores", yet sometimes we write "alphanumeric characters" and "underscore characters". Which one again is the true way?
History
Date User Action Args
2013-08-19 07:45:16vajraskysetrecipients: + vajrasky, akuchling, docs@python
2013-08-19 07:45:16vajraskysetmessageid: <1376898316.85.0.0813282739116.issue18779@psf.upfronthosting.co.za>
2013-08-19 07:45:16vajraskylinkissue18779 messages
2013-08-19 07:45:16vajraskycreate