Author serhiy.storchaka
Recipients Patrick Maupin, ezio.melotti, mrabarnett, serhiy.storchaka
Date 2015-06-11.20:22:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1434054171.6.0.295017496904.issue24426@psf.upfronthosting.co.za>
In-reply-to
Content
Here is a patch that adds more optimizations for searching patterns that starts with a literal string and groups. In particular it includes a case when a pattern starts with a group containing single character.

Examples:

$ ./python -m timeit -s "import re; p = re.compile('(\n)'); s = ('a'*100 + '\n')*1000" -- "p.split(s)"
Unpatched: 100 loops, best of 3: 4.58 msec per loop
Patched  : 1000 loops, best of 3: 562 usec per loop

$ ./python -m timeit -s "import re; p = re.compile('(\n\r)'); s = ('a'*100 + '\n\r')*1000" -- "p.split(s)"
Unpatched: 100 loops, best of 3: 3.1 msec per loop
Patched  : 1000 loops, best of 3: 663 usec per loop

For comparison:

$ ./python -m timeit -s "import re; p = re.compile('\n'); s = ('a'*100 + '\n')*1000" -- "p.split(s)"
1000 loops, best of 3: 329 usec per loop
$ ./python -m timeit -s "import re; p = re.compile('\n\r'); s = ('a'*100 + '\n\r')*1000" -- "p.split(s)"
1000 loops, best of 3: 338 usec per loop

Optimized also more complex but rare cases, such as '\n()\r' or '((\n)(\r))'.

Fast searching no longer can be disabled.
History
Date User Action Args
2015-06-11 20:22:51serhiy.storchakasetrecipients: + serhiy.storchaka, ezio.melotti, mrabarnett, Patrick Maupin
2015-06-11 20:22:51serhiy.storchakasetmessageid: <1434054171.6.0.295017496904.issue24426@psf.upfronthosting.co.za>
2015-06-11 20:22:51serhiy.storchakalinkissue24426 messages
2015-06-11 20:22:51serhiy.storchakacreate