Title: \b requires raw strings or to be escaped. Update docs with that hint?
Created on 2017-01-18 20:21 by Mike.Lissner, last changed 2017-01-18 21:11 by r.david.murray. This issue is now closed.

Messages (2)
msg285751 - (view) Author: Mike Lissner (Mike.Lissner) Date: 2017-01-18 20:21
I just ran into a funny corner case I imagine others are aware of. When you write "\b" in Python, it is a single character: "\x08". So if you try to write a regex like:

words = '\b(.*)\b'

That won't work. But using a raw string will:

words = r'\b(.*)\b'

As will escaping it in this horrible fashion:

words = '\\b(.*)\\b'

I believe this doesn't affect any of the other regex flags, so I wonder if it's worth adding something to the docs to warn about this. I just spent a bunch of time trying to figure out why it seemed like \b wasn't working. A little tip in the docs would have gone a LONG way.
msg285755 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-01-18 21:11
One should always use raw strings for regex expressions, and this is already documented in the introduction to the regex module.  Further, in 3.5 using \ in front of characters that aren't special produces a warning, which should reduce the frequency of this mistake.

I don't see that there's anything to do here.
