This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Behavior of word boundaries in regexes unexpected
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Wellington.Fan, ezio.melotti, mrabarnett, r.david.murray
Priority: normal Keywords:

Created on 2014-05-21 15:38 by Wellington.Fan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg218879 - (view) Author: Wellington Fan (Wellington.Fan) Date: 2014-05-21 15:38
Hello,

It seems that the word boundary sequence -- r'\b' -- is not behaving as expected using re.split(). The regex docs say:

  \b       Matches the empty string, but only at the start or end of a word.

My (failing) test:

> import re
> re.split(r'\b', 'A funky string')
['A funky string']


We get a one-element array returned; I would expect a seven-element array:
['', 'A', ' ', 'funky', ' ', 'string', '']

I have equivalent code in PHP that *does* work:
 php > print_r( preg_split('/\b/', 'A funny string') );
 Array
 (
     [0] =>
     [1] => A
     [2] =>
     [3] => funny
     [4] =>
     [5] => string
     [6] =>
 )
msg218881 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-05-21 16:02
"Note that split will never split a string on an empty pattern match"

You can get what you want this way:

>>> re.split(r'(\w*)', 'a funky string')
['', 'a', ' ', 'funky', ' ', 'string', '']

Or use r'(\W*)' if you don't actually want the leading and training empty strings.
msg218882 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014-05-21 16:19
See also issue #852532, issue #3262 and issue #988761.
History
Date User Action Args
2022-04-11 14:58:03adminsetgithub: 65750
2014-05-21 16:19:00mrabarnettsetmessages: + msg218882
2014-05-21 16:02:43r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg218881

resolution: not a bug
stage: resolved
2014-05-21 15:38:12Wellington.Fancreate