Message255463
The `re` module documentation does not do a good job of explaining exactly what `\w` matches. Quoting https://docs.python.org/3.5/library/re.html :
> \w
> For Unicode (str) patterns:
> Matches Unicode word characters; this includes most characters
> that can be part of a word in any language, as well as numbers
> and the underscore.
Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)". That is a perfectly sensible definition and the documentation should state it in those terms. "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition.
(Further reading: https://gist.github.com/zackw/3077f387591376c7bf67 plus links therefrom). |
|
Date |
User |
Action |
Args |
2015-11-27 15:50:58 | zwol | set | recipients:
+ zwol, docs@python |
2015-11-27 15:50:58 | zwol | set | messageid: <1448639458.78.0.12264064003.issue25743@psf.upfronthosting.co.za> |
2015-11-27 15:50:58 | zwol | link | issue25743 messages |
2015-11-27 15:50:58 | zwol | create | |
|