Message313849
Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers.
This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing this as a duplicate of #12731 (fewer digits ;-).
There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char). In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex. I am leaving #24194 open as the tokenizer name issue. |
|
Date |
User |
Action |
Args |
2018-03-15 00:32:39 | terry.reedy | set | recipients:
+ terry.reedy, lemburg, loewis, vstinner, nathanlmiles, rsc, timehorse, ezio.melotti, mrabarnett, l0nwlf |
2018-03-15 00:32:39 | terry.reedy | set | messageid: <1521073959.39.0.467229070634.issue1693050@psf.upfronthosting.co.za> |
2018-03-15 00:32:39 | terry.reedy | link | issue1693050 messages |
2018-03-15 00:32:39 | terry.reedy | create | |
|