Message116252
Just another rather marginal findings; differences between regex and re:
>>> regex.findall(r"[\B]", "aBc")
['B']
>>> re.findall(r"[\B]", "aBc")
[]
(Python 2.7 ... on win32; regex - issue2636-20100912.zip)
I believe, regex is more correct here, as uppercase \B doesn't have a special meaning within a set (unlike backspace \b), hence it should be treated as B, but I wanted to mention it as a difference, just in case it would matter.
I also noticed another case, where regex is more permissive:
>>> regex.findall(r"[\d-h]", "ab12c-h")
['1', '2', '-', 'h']
>>> re.findall(r"[\d-h]", "ab12c-h")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "re.pyc", line 177, in findall
File "re.pyc", line 245, in _compile
error: bad character range
>>>
howewer, there might be an issue in negated sets, where the negation seem to apply for the first shorthand literal only; the rest is taken positively
>>> regex.findall(r"[^\d-h]", "a^b12c-h")
['-', 'h']
cf. also a simplified pattern, where re seems to work correctly:
>>> regex.findall(r"[^\dh]", "a^b12c-h")
['h']
>>> re.findall(r"[^\dh]", "a^b12c-h")
['a', '^', 'b', 'c', '-']
>>>
or maybe regardless the order - in presence of shorthand literals and normal characters in negated sets, these normal characters are matched positively
>>> regex.findall(r"[^h\s\db]", "a^b 12c-h")
['b', 'h']
>>> re.findall(r"[^h\s\db]", "a^b 12c-h")
['a', '^', 'c', '-']
>>>
also related to character sets but possibly different - maybe adding a (reduntant) character also belonging to the shorthand in a negated set seem to somehow confuse the parser:
regex.findall(r"[^b\w]", "a b")
[]
re.findall(r"[^b\w]", "a b")
[' ']
regex.findall(r"[^b\S]", "a b")
[]
re.findall(r"[^b\S]", "a b")
[' ']
>>> regex.findall(r"[^8\d]", "a 1b2")
[]
>>> re.findall(r"[^8\d]", "a 1b2")
['a', ' ', 'b']
>>>
I didn't find any relevant tracker issues, sorry if I missed some...
I initially wanted to provide test code additions, but as I am not sure about the intended output in all cases, I am leaving it in this form;
vbr |
|
Date |
User |
Action |
Args |
2010-09-12 23:34:28 | vbr | set | recipients:
+ vbr, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray, jhalcrow |
2010-09-12 23:34:28 | vbr | set | messageid: <1284334468.68.0.357269745287.issue2636@psf.upfronthosting.co.za> |
2010-09-12 23:34:27 | vbr | link | issue2636 messages |
2010-09-12 23:34:26 | vbr | create | |
|