This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author zwol
Recipients docs@python, zwol
Date 2015-11-27.15:50:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The `re` module documentation does not do a good job of explaining exactly what `\w` matches.  Quoting :

> \w
> For Unicode (str) patterns:
> Matches Unicode word characters; this includes most characters
> that can be part of a word in any language, as well as numbers
> and the underscore.

Empirically, this appears to mean "everything in Unicode general categories L* and N*, plus U+005F (underscore)".  That is a perfectly sensible definition and the documentation should state it in those terms.  "Unicode word characters" could mean any number of different things; note for instance that UTS#18 gives a very different definition.

(Further reading: plus links therefrom).
Date User Action Args
2015-11-27 15:50:58zwolsetrecipients: + zwol, docs@python
2015-11-27 15:50:58zwolsetmessageid: <>
2015-11-27 15:50:58zwollinkissue25743 messages
2015-11-27 15:50:58zwolcreate