Author serhiy.storchaka
Recipients Patrick Maupin, ezio.melotti, mrabarnett, serhiy.storchaka
Date 2015-06-13.15:51:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1434210696.72.0.35383346889.issue24426@psf.upfronthosting.co.za>
In-reply-to
Content
This is a reason to file a feature request to regex. In 3.3 re was slower than regex in some cases:

$ ./python -m timeit -s "import re; p = re.compile('\n\r'); s = ('a'*100 + '\n\r')*1000" -- "p.split(s)"
Python 3.3 re   : 1000 loops, best of 3: 952 usec per loop
Python 3.4 regex: 1000 loops, best of 3: 757 usec per loop
Python 3.4 re   : 1000 loops, best of 3: 323 usec per loop

And this optimization (issue18685 or others) can be applied to regex.

As for this particular issue, the optimization of splitting with 1-character capturing group needs changes to C part of re engine. Python part of my patch is not needed for this, it is here only for generalizing support of other corner cases. So this issue can't be fixed with patching only Python code.
History
Date User Action Args
2015-06-13 15:51:36serhiy.storchakasetrecipients: + serhiy.storchaka, ezio.melotti, mrabarnett, Patrick Maupin
2015-06-13 15:51:36serhiy.storchakasetmessageid: <1434210696.72.0.35383346889.issue24426@psf.upfronthosting.co.za>
2015-06-13 15:51:36serhiy.storchakalinkissue24426 messages
2015-06-13 15:51:36serhiy.storchakacreate