Message153882
I updated the patch to reflect Éric's comments on Rietveld, but there are also some other changes:
Previously when punctuation chars were set, wordchars was being augmented by '-'. This was incomplete, so the augmentation is now with '~-./*?=' which allows for wildcards, filename chars and argument flags.
I added a token_type attribute whose value is 'a' for alphanumeric tokens and 'c' for punctuation tokens. This token type is internally tracked anyway - we just expose it now. It is needed for when multiple punctuation tokens need to be disambiguated, because we might return two logically separate punctuation tokens as one if they are not separated by whitespace in the source being tokenised.
New attributes and the changes to wordchars have been documented, and a test added for token_type return values. |
|
Date |
User |
Action |
Args |
2012-02-21 17:27:36 | vinay.sajip | set | recipients:
+ vinay.sajip, niemeyer, eric.smith, robodan, eric.araujo |
2012-02-21 17:27:36 | vinay.sajip | set | messageid: <1329845256.46.0.643149543593.issue1521950@psf.upfronthosting.co.za> |
2012-02-21 17:27:35 | vinay.sajip | link | issue1521950 messages |
2012-02-21 17:27:35 | vinay.sajip | create | |
|