Message230839
It is possible to change this behavior (see example patch). With this patch:
>>> re.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT']
>>> re.split(r'\b', "the quick, brown fox")
['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', '']
But unfortunately this is backward incompatible change and will likely break existing code (and breaks tests). Consider following example: re.split('(:*)', 'ab'). Currently the result is ['ab'], but with the patch it is ['', '', 'a', '', 'b', '', ''].
In third-part regex module [1] there is the V1 flag which switches incompatible bahavior change.
>>> regex.split('(:*)', 'ab')
['ab']
>>> regex.split('(?V1)(:*)', 'ab')
['', '', 'a', '', 'b', '', '']
>>> regex.split(r'(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCAGCTGAAACCCCAGCTGACGTACGT']
>>> regex.split(r'(?V1)(?<=CA)(?=GCTG)', 'ACGTCAGCTGAAACCCCAGCTGACGTACGT')
['ACGTCA', 'GCTGAAACCCCA', 'GCTGACGTACGT']
>>> regex.split(r'\b', "the quick, brown fox")
['the quick, brown fox']
>>> regex.split(r'(?V1)\b', "the quick, brown fox")
['', 'the', ' ', 'quick', ', ', 'brown', ' ', 'fox', '']
I don't know how to solve this issue without introducing such flag (or adding special boolean argument to re.split()).
As a workaround I suggest you to use the regex module.
[1] https://pypi.python.org/pypi/regex |
|
Date |
User |
Action |
Args |
2014-11-08 09:11:19 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, ezio.melotti, mrabarnett, rexdwyer |
2014-11-08 09:11:19 | serhiy.storchaka | set | messageid: <1415437879.71.0.116422408869.issue22817@psf.upfronthosting.co.za> |
2014-11-08 09:11:19 | serhiy.storchaka | link | issue22817 messages |
2014-11-08 09:11:19 | serhiy.storchaka | create | |
|