Message70799
While working on the regex code in sre_compile.py I came across the
following code in the handling of charset ranges in _optimize_charset:
for i in range(fixup(av[0]), fixup(av[1])+1):
charmap[i] = 1
The function fixup converts the ends of the range to lower case if the
ignore-case flag is present. The problem with this approach is
illustrated below:
>>> import re
>>> print re.match(r'[9-A]', 'A')
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', 'a')
None
>>> print re.match(r'[9-A]', '_')
None
>>> print re.match(r'[9-A]', 'A', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>> print re.match(r'[9-A]', 'a', re.IGNORECASE)
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', '_', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>>
'_' doesn't lie between '9' and 'A', but it does lie between '9' and 'a'.
Surely the ignore-case flag should not affect whether non-letters are
matched or not? |
|
Date |
User |
Action |
Args |
2008-08-06 19:42:15 | mrabarnett | set | recipients:
+ mrabarnett |
2008-08-06 19:42:15 | mrabarnett | set | messageid: <1218051735.51.0.548930988583.issue3511@psf.upfronthosting.co.za> |
2008-08-06 19:41:11 | mrabarnett | link | issue3511 messages |
2008-08-06 19:41:10 | mrabarnett | create | |
|