This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author zwol
Recipients docs@python, zwol
Date 2015-11-27.15:50:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1448639458.78.0.12264064003.issue25743@psf.upfronthosting.co.za>
In-reply-to
Content
The `re` module documentation does not do a good job of explaining exactly what `\w` matches.  Quoting https://docs.python.org/3.5/library/re.html :

> \w
> For Unicode (str) patterns:
> Matches Unicode word characters; this includes most characters
> that can be part of a word in any language, as well as numbers
> and the underscore.

Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)".  That is a perfectly sensible definition and the documentation should state it in those terms.  "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition.

(Further reading: https://gist.github.com/zackw/3077f387591376c7bf67 plus links therefrom).
History
Date User Action Args
2015-11-27 15:50:58zwolsetrecipients: + zwol, docs@python
2015-11-27 15:50:58zwolsetmessageid: <1448639458.78.0.12264064003.issue25743@psf.upfronthosting.co.za>
2015-11-27 15:50:58zwollinkissue25743 messages
2015-11-27 15:50:58zwolcreate