This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, mrabarnett, rexdwyer, serhiy.storchaka
Date 2014-11-08.09:11:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1415437879.71.0.116422408869.issue22817@psf.upfronthosting.co.za>
In-reply-to
Content
It is possible to change this behavior (see example patch). With this patch:

>>> re.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT']
>>> re.split(r'\b', "the quick, brown fox")
['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', '']

But unfortunately this is backward incompatible change and will likely break existing code (and breaks tests). Consider following example: re.split('(:*)', 'ab'). Currently the result is ['ab'], but with the patch it is ['', '', 'a', '', 'b', '', ''].

In third-part regex module [1] there is the V1 flag which switches incompatible bahavior change.

>>> regex.split('(:*)', 'ab')
['ab']
>>> regex.split('(?V1)(:*)', 'ab')
['', '', 'a', '', 'b', '', '']
>>> regex.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCAGCTGAAACCCCAGCTGACGTACGT']
>>> regex.split(r'(?V1)(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT']
>>> regex.split(r'\b', "the quick, brown fox")
['the quick, brown fox']
>>> regex.split(r'(?V1)\b', "the quick, brown fox")
['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', '']

I don't know how to solve this issue without introducing such flag (or adding special boolean argument to re.split()).

As a workaround I suggest you to use the regex module.

[1] https://pypi.python.org/pypi/regex
History
Date User Action Args
2014-11-08 09:11:19serhiy.storchakasetrecipients: + serhiy.storchaka, ezio.melotti, mrabarnett, rexdwyer
2014-11-08 09:11:19serhiy.storchakasetmessageid: <1415437879.71.0.116422408869.issue22817@psf.upfronthosting.co.za>
2014-11-08 09:11:19serhiy.storchakalinkissue22817 messages
2014-11-08 09:11:19serhiy.storchakacreate