Author serhiy.storchaka
Recipients azsorkin, pitrou, serhiy.storchaka, vstinner
Date 2015-11-13.22:14:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1447452884.31.0.563675010039.issue24821@psf.upfronthosting.co.za>
In-reply-to
Content
Proposed patch makes the degenerate case less hard while preserves the optimization for common case.

$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 330 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 325 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
100 loops, best of 3: 7.81 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
100 loops, best of 3: 8.5 msec per loop

$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("є")'
1000 loops, best of 3: 317 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("є")'
1000 loops, best of 3: 327 usec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.find("Є")'
1000 loops, best of 3: 1.1 msec per loop
$ ./python -m timeit -s 's = "АБВГД"*10**5' -- 's.rfind("Є")'
1000 loops, best of 3: 964 usec per loop

The slowdown is decreased from 25 times to 3 times.

The idea is that after memchr found false positive, make a tens iterations of simple loop before calling memchr again. This splits the cost of the memchr call with a tens of characters.

The patch also makes a little refactoring. STRINGLIB(fastsearch_memchr_1char) now is renamed and split on two functions STRINGLIB(find_char) and STRINGLIB(rfind_char) with simpler interface. All preconditional checks are moved into these functions. These functions now are directly used in other files.
History
Date User Action Args
2015-11-13 22:14:44serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, vstinner, azsorkin
2015-11-13 22:14:44serhiy.storchakasetmessageid: <1447452884.31.0.563675010039.issue24821@psf.upfronthosting.co.za>
2015-11-13 22:14:44serhiy.storchakalinkissue24821 messages
2015-11-13 22:14:44serhiy.storchakacreate