This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients docs@python, ezio.melotti, georg.brandl, mrabarnett, sjmachin
Date 2012-01-29.15:32:26
SpamBayes Score 3.0792036e-13
Marked as misclassified No
Message-id <1327851147.32.0.752780075223.issue13899@psf.upfronthosting.co.za>
In-reply-to
Content
[\w] should definitely work, but [\B] doesn't seem to match anything useful, and it just fails silently because it's neither equivalent to \B nor to [B]:
>>> re.match(r'foo\B', 'foobar')  # on a non-word-boundary -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[B]', 'fooBar')  # same as r'fooB'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\B]', 'foobar')  # not equivalent to \B
>>> re.match(r'foo[\B]', 'fooBar')  # not equivalent to [B]

The same is true for \Z and \A:
>>> re.match(r'foo\Z', 'foo')  # end of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'foo[Z]', 'fooZ')  # same as r'fooZ'
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'foo[\Z]', 'foo')  # not equivalent to \Z
>>> re.match(r'foo[\Z]', 'fooZ')  # not equivalent to [Z]
>>>
>>> re.match(r'\Afoo', 'foo')  # beginning of the string -- matches fine
<_sre.SRE_Match object at 0xb76dd1e0>
>>> re.match(r'[A]foo', 'Afoo')  # same as r'Afoo'
<_sre.SRE_Match object at 0xb76dd3a0>
>>> re.match(r'[\A]foo', 'foo')  # not equivalent to \A
>>> re.match(r'[\A]foo', 'Afoo')  # not equivalent to [A]

Inside [], \b switches from word boundary to backspace:
>>> re.match(r'foo\b', 'foobar')  # not on a word boundary -- no matches
>>> re.match(r'foo\b', 'foo bar')  # on a word boundary  -- matches fine
<_sre.SRE_Match object at 0xb74a4ec8>
>>> re.match(r'foo[\b]', 'foo bar')  # not equivalent to \b
>>> re.match(r'foo[\b]', 'foo\bbar')  # matches backspace
<_sre.SRE_Match object at 0xb76dd3d8>
>>> re.match(r'foo([\b])', 'foo\bbar').group(1)
'\x08'

Given that \b doesn't keep its word boundary meaning inside the [], \B (and \A and \Z) shouldn't keep it either (also because I can't see how having these inside [] would be of any use).
On the other hand I'm not sure they should be equivalent to B, A, Z either.  There are several escape sequences in the form \X (where X is an upper- or lower-case letter) that are not equivalent to X (\a\b\d\f\s\x\w\D\S\W...).
Raising an error that says something like "I don't think [\A] does what you think it does, use [A] instead." might be a better option (and in case anyone is wondering about re.escape, I just checked and it doesn't escape letters).  Even if this is technically backward incompatible, any string that has \A, \B, \Z inside [] can be considered buggy IMHO (unless someone can come up with a valid use case where they do something useful).
History
Date User Action Args
2012-01-29 15:32:27ezio.melottisetrecipients: + ezio.melotti, georg.brandl, sjmachin, mrabarnett, docs@python
2012-01-29 15:32:27ezio.melottisetmessageid: <1327851147.32.0.752780075223.issue13899@psf.upfronthosting.co.za>
2012-01-29 15:32:26ezio.melottilinkissue13899 messages
2012-01-29 15:32:26ezio.melotticreate