classification
Title: (compiled RegEx).split gives unexpected results if () in pattern
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, dnotmanj, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2015-01-25 17:31 by dnotmanj, last changed 2015-01-25 18:10 by SilentGhost. This issue is now closed.

Messages (2)
msg234677 - (view) Author: Dave Notman (dnotmanj) Date: 2015-01-25 17:31
# Python 3.3.1 (default, Sep 25 2013, 19:30:50)
# Linux 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013 i686 i686 i686 GNU/Linux

import re

splitter = re.compile( r'(\s*[+/&;,]\s*)|(\s+and\s+)' )
ll = splitter.split( 'Dave & Sam, Jane and Zoe' )
print(repr(ll))

print( 'Try again with revised RegEx' )
splitter = re.compile( r'(?:(?:\s*[+/&;,]\s*)|(?:\s+and\s+))' )
ll = splitter.split( 'Dave & Sam, Jane and Zoe' )
print(repr(ll))

Results:
['Dave', ' & ', None, 'Sam', ', ', None, 'Jane', None, ' and ', 'Zoe']
Try again with revised RegEx
['Dave', 'Sam', 'Jane', 'Zoe']
msg234678 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2015-01-25 18:10
Looks like it works exactly as the docs[1] describe:

>>> re.split(r'\s*[+/&;,]\s*|\s+and\s+', string)
['Dave', 'Sam', 'Jane', 'Zoe']

You're using capturing groups (parentheses) in your original regex which returns separators as part of a match.

[1] https://docs.python.org/3/library/re.html#re.split
History
Date User Action Args
2015-01-25 18:10:22SilentGhostsetstatus: open -> closed

nosy: + SilentGhost
messages: + msg234678

resolution: not a bug
2015-01-25 17:31:37dnotmanjcreate