This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vbr
Recipients akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2010-09-12.23:34:26
SpamBayes Score 1.0091927e-12
Marked as misclassified No
Message-id <1284334468.68.0.357269745287.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
Just another rather marginal findings; differences between regex and re:

>>> regex.findall(r"[\B]", "aBc")
['B']
>>> re.findall(r"[\B]", "aBc")
[]

(Python 2.7 ... on win32; regex - issue2636-20100912.zip)
I believe, regex is more correct here, as uppercase \B doesn't have a special meaning within a set (unlike backspace \b), hence it should be treated as B, but I wanted to mention it as a difference, just in case it would matter.

I also noticed another case, where regex is more permissive:

>>> regex.findall(r"[\d-h]", "ab12c-h")
['1', '2', '-', 'h']
>>> re.findall(r"[\d-h]", "ab12c-h")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "re.pyc", line 177, in findall
  File "re.pyc", line 245, in _compile
error: bad character range
>>> 

howewer, there might be an issue in negated sets, where the negation seem to apply for the first shorthand literal only; the rest is taken positively

>>> regex.findall(r"[^\d-h]", "a^b12c-h")
['-', 'h']

cf. also a simplified pattern, where re seems to work correctly:

>>> regex.findall(r"[^\dh]", "a^b12c-h")
['h']
>>> re.findall(r"[^\dh]", "a^b12c-h")
['a', '^', 'b', 'c', '-']
>>> 

or maybe regardless the order - in presence of shorthand literals and normal characters in negated sets, these normal characters are matched positively

>>> regex.findall(r"[^h\s\db]", "a^b 12c-h")
['b', 'h']
>>> re.findall(r"[^h\s\db]", "a^b 12c-h")
['a', '^', 'c', '-']
>>> 

also related to character sets but possibly different - maybe adding a (reduntant) character also belonging to the shorthand in a negated set seem to somehow confuse the parser:

regex.findall(r"[^b\w]", "a b")
[]
re.findall(r"[^b\w]", "a b")
[' ']

regex.findall(r"[^b\S]", "a b")
[]
re.findall(r"[^b\S]", "a b")
[' ']

>>> regex.findall(r"[^8\d]", "a 1b2")
[]
>>> re.findall(r"[^8\d]", "a 1b2")
['a', ' ', 'b']
>>> 

I didn't find any relevant tracker issues, sorry if I missed some...
I initially wanted to provide test code additions, but as I am not sure about the intended output in all cases, I am leaving it in this form;

vbr
History
Date User Action Args
2010-09-12 23:34:28vbrsetrecipients: + vbr, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray, jhalcrow
2010-09-12 23:34:28vbrsetmessageid: <1284334468.68.0.357269745287.issue2636@psf.upfronthosting.co.za>
2010-09-12 23:34:27vbrlinkissue2636 messages
2010-09-12 23:34:26vbrcreate