Message 255463 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	zwol
Recipients	docs@python, zwol
Date	2015-11-27.15:50:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1448639458.78.0.12264064003.issue25743@psf.upfronthosting.co.za>
In-reply-to

Content
The `re` module documentation does not do a good job of explaining exactly what `\w` matches. Quoting https://docs.python.org/3.5/library/re.html : > \w > For Unicode (str) patterns: > Matches Unicode word characters; this includes most characters > that can be part of a word in any language, as well as numbers > and the underscore. Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)". That is a perfectly sensible definition and the documentation should state it in those terms. "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition. (Further reading: https://gist.github.com/zackw/3077f387591376c7bf67 plus links therefrom).

The `re` module documentation does not do a good job of explaining exactly what `\w` matches.  Quoting https://docs.python.org/3.5/library/re.html :

> \w
> For Unicode (str) patterns:
> Matches Unicode word characters; this includes most characters
> that can be part of a word in any language, as well as numbers
> and the underscore.

Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)".  That is a perfectly sensible definition and the documentation should state it in those terms.  "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition.

(Further reading: https://gist.github.com/zackw/3077f387591376c7bf67 plus links therefrom).

History
Date	User	Action	Args
2015-11-27 15:50:58	zwol	set	recipients: + zwol, docs@python
2015-11-27 15:50:58	zwol	set	messageid: <1448639458.78.0.12264064003.issue25743@psf.upfronthosting.co.za>
2015-11-27 15:50:58	zwol	link	issue25743 messages
2015-11-27 15:50:58	zwol	create