classification
Title: Implement fastsearch algorithm for rfind/rindex
Type: performance Stage: committed/rejected
Components: Interpreter Core Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: effbot, ezio.melotti, flox, pitrou, rhettinger
Priority: normal Keywords: patch

Created on 2009-12-08 22:46 by flox, last changed 2010-01-04 09:26 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
issue7462_string_tests.diff flox, 2009-12-20 22:55 Patch, apply to trunk
issue7462_fastsearch_v3.diff flox, 2009-12-21 13:30 Patch, apply to trunk
stringbench_fixes_and_additions_v3.diff flox, 2009-12-21 21:17 Patch for sandbox/trunk/stringbench/
bench_rfind_algorithms_v3.diff flox, 2009-12-21 21:29 Benchmark Results (including rpartition and rsplit)
Messages (25)
msg96157 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-08 22:46
While looking at issue7458 I stopped on:

  /* XXX - create reversefastsearch helper! */

Here is the patch which is just the translation of the same algorithm
already implemented for "find/index".
http://effbot.org/zone/stringlib.htm

Note: it supersedes patch for 7458.
msg96163 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-09 00:24
(removed comment which should not be copied)
msg96724 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-20 22:55
Additional test cases for rfind.
msg96725 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-20 23:08
Bench results show the benefit.

And attached patch for stringbench tool.
msg96726 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-20 23:26
There's a typo in the patch for stringbench.

Updated patch (with rindex tests, too).
msg96727 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-20 23:55
Updated benchmarks.

 str    unicode
 (ms)   (ms)

 ========== late match, 100 characters
                s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s)
 32.89  15.65   rfind (classic)
 32.81  15.63   rindex (classic)
 11.77  13.27   rfind (fastsearch)
 11.78  13.40   rindex (fastsearch)

 ========== late match, two characters
 4.34   2.36    ("C"+"AB"*300).rfind("CA") (classic)
 4.44   2.36    ("C"+"AB"*300).rindex("CA") (classic)
 2.10   2.31    ("C"+"AB"*300).rfind("CA") (fastsearch)
 2.10   2.32    ("C"+"AB"*300).rindex("CA") (fastsearch)

 ========== no match, two characters
 14.12  13.46   ("AB"*1000).rfind("BC") (classic)
 7.67   8.26    ("AB"*1000).rfind("BC") (fastsearch)
msg96729 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 00:43
Need additional patch for rpartition and rsplit on string, unicode and
bytearray objects.
msg96738 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 11:32
Updated patch with optimization for:
 * rfind
 * rindex
 * rsplit
 * rpartition

And an optimization was implemented in r46398 but never used because
"#define USE_FAST" was removed 2 hours before: r46366.
==> I changed this to activate optimization on "S.split" too.

I propose to commit these changes before looking for further improvements.
(I plan to refactor some functions and macro to Objects/stringlib)
msg96739 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 11:33
Actually, the USE_FLAG macro was removed in r46367 (not 46366).

« needforspeed: remove remaining USE_FAST macros; if fastsearch was
broken, someone would have noticed by now ;-) »
msg96743 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 12:36
I reupload the patch fixed (sorry).
msg96747 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-21 12:51
Some things:
- is STRINGLIB_CMP still used? if not, it should be removed (use grep :-))
- please don't use "#if 1"
- if USE_FAST isn't used anymore, it should be remove as well
- did you check that rpartition, split and friends were sped up by the
patch?

Thanks for working on this!
msg96748 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 13:01
Actually I see different macros which do the same thing: I will consider
reusing STRINGLIB_CMP to re-define PyUNICODE_MATCH and PySTRING_MATCH
(or create aliases if they have the same signature).

I can prepare a "all-in-one" patch which fixes all these things.
But don't you think we should do this incrementally, i.e. commit the
current patch before refactoring more code?

I will remove the "#if 1  /* USE_FAST */".
msg96749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-21 13:06
> Actually I see different macros which do the same thing: I will consider
> reusing STRINGLIB_CMP to re-define PyUNICODE_MATCH and PySTRING_MATCH
> (or create aliases if they have the same signature).

STRINGLIB_CMP, as the name implies, should only be used by stringlib.
(anything which doesn't start with Py* shouldn't be exported in the
official include files)

> But don't you think we should do this incrementally, i.e. commit the
> current patch before refactoring more code?

Well, if STRINGLIB_CMP isn't used anymore, removing it should be part of
the current issue. There's no reason to leave dead code in the source
tree. I agree that further cleanups can be part of another issue.
msg96750 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 13:30
Removing "slow" parts and unnused macros STRINGLIB_CMP and USE_FAST.
msg96757 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-21 16:12
I haven't reviewed the algorithm (I don't know if I will, honestly), but
at least on the principle this looks good.
Fredrik, do you want to take a look?
msg96771 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-21 21:29
New benchmarks (and patch for the benchmark tool).

Best improvement is reached with such expression:
s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)

String (classic):  93.14 ms
String (fast):      8.78 ms

Unicode (classic): 78.62 ms
Unicode (fast):     9.42 ms
msg97039 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 16:38
Looking a bit more at the patch:

+                /* miss: check if previous character is part of pattern */
+                if (!(mask & (1 << (s[i-1] & 0x1F))))

From what I understand, this should be s[i-m]. Same here:

+                /* skip: check if previous character is part of pattern */
+                if (!(mask & (1 << (s[i-1] & 0x1F))))
msg97084 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-31 10:03
I checked the code, and it is the right thing:

Example 1 (fastsearch):

s, p = "ABCDEABCF", "BCD"
s.rfind(p)

# i = 6  is first candidate, but "BCF" != "BCD"
# then s[i-1] = "A" is not part of pattern
# so we skip 3 chars: next loop is i = 2
# etc...


Example 2 (msg97039):

s, p = "ABCDBCBCF", "BCB"
s.rfind(p)

# i = 6  is first candidate, but "BCF" != "BCB"
# then s[i-m] = "D" is not part of pattern
# so we skip 3 chars: next loop is i = 2.. and it fails!

I've tested many examples to understand and verify the algorithm.
msg97087 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-31 12:25
> I checked the code, and it is the right thing:

Indeed, you are right. My bad!
msg97103 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-31 16:50
I will take a last look at it and commit if I see nothing wrong.
msg97105 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-12-31 17:51
Are there any simple, common cases that are made slower by this patch? 
IIRC, that was the reason this wasn't implemented previously.
msg97106 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2009-12-31 17:57
The benchmark tests show significant improvements in most cases up to 10
times faster.

And I found no test case which show performance regression (using the
"stringbench" benchmark from GvR, and additional tests).

Moreover, the very same algorithm is already implemented for
find/index/partition.
msg97146 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-02 20:41
I've added a version number to stringbench and committed the changes in r77240.
msg97148 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-01-02 21:42
The main patch has been committed in r77241 (trunk) and r77246 (py3k).
I've ommitted the tests you had added for issue7458.
Thank you!
msg97200 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2010-01-04 09:20
Thanks Florent!

> Are there any simple, common cases that are made slower by this patch? 

The original fastsearch implementation has a couple of special cases to make sure it's faster than the original code in all cases.  The reason it wasn't implemented for reverse search was more a question of developer time constraints; reverse search isn't nearly as common as forward search, and we had other low-hanging fruit to deal with.

(btw, while it's great that someone finally got around to fix this, it wouldn't surprise me if replacing the KMP implementation in SRE with a fastsearch would save as many CPU cycles worldwide as this patch :)
History
Date User Action Args
2010-01-04 09:26:26ezio.melottisetnosy: + ezio.melotti
2010-01-04 09:20:20effbotsetmessages: + msg97200
2010-01-02 21:42:27pitrousetstatus: open -> closed
resolution: fixed
messages: + msg97148

stage: commit review -> committed/rejected
2010-01-02 20:41:43pitrousetmessages: + msg97146
2009-12-31 17:57:57floxsetmessages: + msg97106
2009-12-31 17:51:08rhettingersetnosy: + rhettinger
messages: + msg97105
2009-12-31 16:50:30pitrousetpriority: normal
versions: - Python 2.6, Python 3.1
messages: + msg97103

assignee: pitrou
stage: patch review -> commit review
2009-12-31 12:25:14pitrousetmessages: + msg97087
2009-12-31 10:03:44floxsetmessages: + msg97084
2009-12-30 16:38:58pitrousetmessages: + msg97039
2009-12-21 21:29:27floxsetfiles: - bench_rfind_algorithms_v2.diff
2009-12-21 21:29:17floxsetfiles: + bench_rfind_algorithms_v3.diff

messages: + msg96771
2009-12-21 21:17:17floxsetfiles: + stringbench_fixes_and_additions_v3.diff
2009-12-21 21:16:59floxsetfiles: - stringbench_fixes_and_additions_v2.diff
2009-12-21 20:55:08floxsetfiles: + stringbench_fixes_and_additions_v2.diff
2009-12-21 18:53:15floxsetfiles: - stringbench_fixes_and_additions.diff
2009-12-21 18:38:18floxsetfiles: + stringbench_fixes_and_additions.diff
2009-12-21 18:37:51floxsetfiles: - stringbench_rfind_rindex.diff
2009-12-21 16:12:47pitrousetnosy: + effbot
messages: + msg96757
2009-12-21 13:31:06floxsetfiles: - issue7462_fastsearch_v2.diff
2009-12-21 13:30:58floxsetfiles: + issue7462_fastsearch_v3.diff

messages: + msg96750
2009-12-21 13:06:22pitrousetmessages: + msg96749
2009-12-21 13:01:45floxsetmessages: + msg96748
2009-12-21 12:51:44pitrousetnosy: + pitrou
messages: + msg96747
2009-12-21 12:36:34floxsetfiles: + issue7462_fastsearch_v2.diff

messages: + msg96743
2009-12-21 12:35:54floxsetfiles: - fastsearch_rfind_v2.patch
2009-12-21 11:49:50floxsetfiles: - fastsearch_rfind.patch
2009-12-21 11:33:48floxsetmessages: + msg96739
2009-12-21 11:32:03floxsetfiles: + fastsearch_rfind_v2.patch

messages: + msg96738
stage: needs patch -> patch review
2009-12-21 00:43:47floxsetmessages: + msg96729
stage: patch review -> needs patch
2009-12-20 23:55:43floxsetfiles: - bench_rfind_algorithms.diff
2009-12-20 23:55:08floxsetfiles: + bench_rfind_algorithms_v2.diff

messages: + msg96727
stage: patch review
2009-12-20 23:26:18floxsetfiles: - stringbench_rfind.diff
2009-12-20 23:26:06floxsetfiles: + stringbench_rfind_rindex.diff

messages: + msg96726
2009-12-20 23:08:35floxsetfiles: + bench_rfind_algorithms.diff
2009-12-20 23:08:14floxsetfiles: + stringbench_rfind.diff

messages: + msg96725
2009-12-20 22:55:34floxsetfiles: + issue7462_string_tests.diff

messages: + msg96724
2009-12-09 00:24:46floxsetfiles: - fastsearch_rfind.patch
2009-12-09 00:24:20floxsetfiles: + fastsearch_rfind.patch

messages: + msg96163
2009-12-08 22:46:43floxcreate