Message270933
Looking at issue 2382, I agree that's a different problem (I'm seeing the current misbehaviour even though everything is consistently encoded as UTF-8)
The main case we're interested in here is the PyUnicode_IsIdentifier one, so if we wanted to do better than "start or end of the token", we could introduce a new internal "_PyUnicode_FindNonIdentifier" that reported the position of the first non-identifier character (or -1 if it's a valid identifier).
Unfortunately, I'm not at all familiar with parsetok.c myself (my own work with the code generator has been from the AST on), so I don't have a ready answer for your other questions. |
|
Date |
User |
Action |
Args |
2016-07-21 15:06:14 | ncoghlan | set | recipients:
+ ncoghlan, Rosuav, berker.peksag |
2016-07-21 15:06:14 | ncoghlan | set | messageid: <1469113574.75.0.991325182412.issue27582@psf.upfronthosting.co.za> |
2016-07-21 15:06:14 | ncoghlan | link | issue27582 messages |
2016-07-21 15:06:14 | ncoghlan | create | |
|