New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string.Template should use re.ASCII flag #75853
Comments
Currently, strings.Template uses re.IGNORECASE without re.ASCII: idpattern = r'[_a-z][_a-z0-9]*'
flags = _re.IGNORECASE [a-z] matches against 'ı' (0x131, DOTLESS I) and 'ſ' (0x17f, LONG S). |
Technically it is an API change since
This means if someone does subclass string.Template and changes the pattern to accept Unicode identifiers, then with this change they will also have to modify flags, whereas before they didn't. It really wasn't ever the intention to allow non-ASCII identifiers, so this is probably safe in practice. OTOH, making the change for performance reasons might be questionable, given that the regular expressions are compiled by the Template's metaclass, so unlikely to contribute significantly to overall performance wins. |
Thank you for pointing it out.
original: import time: 2310 | 9589 | string We can save about 900 us. |
When optimizing, please don't make API changes. |
Yet one way -- make re.ASCII a local flag. Than we could just change the idpattern attribute to r'(?a:[_a-z][_a-z0-9]*)', without touching the flags attribute. |
This is not only optimization, but bugfix. Document of string.Template says:
So, missing re.ASCII flag is bug because non-ASCII alphabet can be matched. |
https://docs.python.org/3.6/library/re.html#regular-expression-syntax says Anyway, Template.idpattern is documented public API too. |
Current pull request override But I'm not sure it's right approach. |
On Oct 4, 2017, at 02:29, INADA Naoki report@bugs.python.org wrote:
Too bad, because I like that approach. How hard would it be to add support for ‘a’ as a local flag? (I’m kind of surprised that it isn’t already supported - it seems like it would be useful.) I would like this better than hacking Template.flags after the fact. It seems like the right way to align the code with the documentation (i.e. fix a bug). |
See bpo-31690. But this solution can be used only in 3.7. |
On Oct 4, 2017, at 10:05, Serhiy Storchaka <report@bugs.python.org> wrote:
That’s fine. I don’t think this is important enough to backport. |
Another solution (works in 3.6 too): set idpattern to r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'. This looks pretty weird, setting the re.IGNORECASE flag and then unsetting it. But it works, and don't break the user code that changes idpattern without changing flags. |
On Oct 6, 2017, at 15:04, Serhiy Storchaka <report@bugs.python.org> wrote:
Oh, I think I like that :) |
With some comments to clarify of course. |
Is something left to do here? In 3.7 r'(?-i:[_a-zA-Z][_a-zA-Z0-9])' can be replaced with r'(?a:[_a-z][_a-z0-9])' if it will add clarity. |
@inada.naoki - I think changing this to ?a will make things more clear. I submitted a PR for that, and then once it lands, we can close this issue. |
PR 5099 reverts in 3.7 a subtle behavior change made by the fix for this issue. Before merging the fix setting flags to 0 in a Template subclass made the default pattern matching only lower case letters. Currently setting flags to 0 doesn't have any effect. |
Results. There are 4 cases:
The case 1 was the one that was broken when this issue was opened. The initial Inada's version fixed the case 1, but broke the case 2. His final version (applied also to 3.6) fixed the case 1, but broke the case 3. This is a win, because cases 1 and 2 look much more common than the case 3. And finally PR 5099 has fixed also the case 3. The case 4 obviously is not affected by any changes of default values. Now all four cases are correct in 3.7 and the only broken case in 3.6 is the uncommon case 3. |
re.A | re.I
flag for identifier pattern #3872Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: