New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect charset range handling with ignore case flag? #47761
Comments
While working on the regex code in sre_compile.py I came across the for i in range(fixup(av[0]), fixup(av[1])+1):
charmap[i] = 1 The function fixup converts the ends of the range to lower case if the >>> import re
>>> print re.match(r'[9-A]', 'A')
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', 'a')
None
>>> print re.match(r'[9-A]', '_')
None
>>> print re.match(r'[9-A]', 'A', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>> print re.match(r'[9-A]', 'a', re.IGNORECASE)
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', '_', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>> '_' doesn't lie between '9' and 'A', but it does lie between '9' and 'a'. Surely the ignore-case flag should not affect whether non-letters are |
I think this is even more complicated when you consider that In a sense, I think it may only be safe to say that character class In the end, I think it's just dangerous to define character group ranges I do agree this is a problem, but as I see it, the solution may not be |
I'd close this as "won't fix", because (IMHO) ranges like [9-A] FWIW Perl doesn't seem to match the '_', even with the 'i' flag. Tested |
If there's already a patch, then it's fine (and useful for ranges of |
EM and MB seemed to agree on closing this. |
Fixed in bpo-17381 (which has more realistic example than [9-A]). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: