This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients ezio.melotti, l0nwlf, lemburg, loewis, mrabarnett, nathanlmiles, rsc, terry.reedy, timehorse, vstinner
Date 2018-03-15.00:32:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1521073959.39.0.467229070634.issue1693050@psf.upfronthosting.co.za>
In-reply-to
Content
Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers.

This is one of 2 issues about \w being defined too narrowly.  I am somewhat arbitrarily closing this as a duplicate of #12731 (fewer digits ;-).

There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char).  In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex.  I am leaving #24194 open as the tokenizer name issue.
History
Date User Action Args
2018-03-15 00:32:39terry.reedysetrecipients: + terry.reedy, lemburg, loewis, vstinner, nathanlmiles, rsc, timehorse, ezio.melotti, mrabarnett, l0nwlf
2018-03-15 00:32:39terry.reedysetmessageid: <1521073959.39.0.467229070634.issue1693050@psf.upfronthosting.co.za>
2018-03-15 00:32:39terry.reedylinkissue1693050 messages
2018-03-15 00:32:39terry.reedycreate