Author vbr
Recipients akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2010-09-12.23:34:26
SpamBayes Score 1.00919e-12
Marked as misclassified No
Message-id <>
Just another rather marginal findings; differences between regex and re:

>>> regex.findall(r"[\B]", "aBc")
>>> re.findall(r"[\B]", "aBc")

(Python 2.7 ... on win32; regex -
I believe, regex is more correct here, as uppercase \B doesn't have a special meaning within a set (unlike backspace \b), hence it should be treated as B, but I wanted to mention it as a difference, just in case it would matter.

I also noticed another case, where regex is more permissive:

>>> regex.findall(r"[\d-h]", "ab12c-h")
['1', '2', '-', 'h']
>>> re.findall(r"[\d-h]", "ab12c-h")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "re.pyc", line 177, in findall
  File "re.pyc", line 245, in _compile
error: bad character range

howewer, there might be an issue in negated sets, where the negation seem to apply for the first shorthand literal only; the rest is taken positively

>>> regex.findall(r"[^\d-h]", "a^b12c-h")
['-', 'h']

cf. also a simplified pattern, where re seems to work correctly:

>>> regex.findall(r"[^\dh]", "a^b12c-h")
>>> re.findall(r"[^\dh]", "a^b12c-h")
['a', '^', 'b', 'c', '-']

or maybe regardless the order - in presence of shorthand literals and normal characters in negated sets, these normal characters are matched positively

>>> regex.findall(r"[^h\s\db]", "a^b 12c-h")
['b', 'h']
>>> re.findall(r"[^h\s\db]", "a^b 12c-h")
['a', '^', 'c', '-']

also related to character sets but possibly different - maybe adding a (reduntant) character also belonging to the shorthand in a negated set seem to somehow confuse the parser:

regex.findall(r"[^b\w]", "a b")
re.findall(r"[^b\w]", "a b")
[' ']

regex.findall(r"[^b\S]", "a b")
re.findall(r"[^b\S]", "a b")
[' ']

>>> regex.findall(r"[^8\d]", "a 1b2")
>>> re.findall(r"[^8\d]", "a 1b2")
['a', ' ', 'b']

I didn't find any relevant tracker issues, sorry if I missed some...
I initially wanted to provide test code additions, but as I am not sure about the intended output in all cases, I am leaving it in this form;

Date User Action Args
2010-09-12 23:34:28vbrsetrecipients: + vbr, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray, jhalcrow
2010-09-12 23:34:28vbrsetmessageid: <>
2010-09-12 23:34:27vbrlinkissue2636 messages
2010-09-12 23:34:26vbrcreate