Author tim.peters
Recipients Dennis Sweeney, Zeturic, ammar2, corona10, gregory.p.smith, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-17.20:51:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602967872.95.0.892700676627.issue41972@roundup.psfhosted.org>
In-reply-to
Content
When I ran stringbench yesterday (or the day before - don't remember), almost all the benefit seemed to come from the "late match, 100 characters" tests.  Seems similar for your run.  Here are your results for just that batch, interleaving the two runs to make it easy to see:  first line from the "before" run, second line from the "after" (PR) run, then a blank line.  Lather, rinse, repeat.

These are dramatic speedups for the "search forward" cases.  But there _also_ seem to be real (but much smaller) benefits for the "search backward" cases, which I don't recall seeing when I tried it.  Do you have a guess as to why?

========== late match, 100 characters
bytes   unicode
(in ms) (in ms) ratio%=bytes/unicode*100
2.73    3.88    70.4    s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
0.17    0.15    116.3   s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)

2.01    3.54    56.8    s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
0.89    0.87    101.8   s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)

1.66    2.36    70.2    s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
0.15    0.13    111.5   s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)

2.74    3.89    70.5    s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
0.17    0.15    112.4   s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)

3.93    4.00    98.4    s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
0.30    0.27    108.0   s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)

3.99    4.59    86.8    s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
3.13    2.51    124.5   s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)

1.64    2.23    73.3    s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
1.54    1.82    84.5    s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)

3.97    4.59    86.4    s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.18    2.53    125.8   s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)

4.69    4.67    100.3   s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.37    2.66    126.9   s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)

4.09    2.82    145.0   s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
3.39    2.62    129.5   s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)

3.50    3.51    99.7    s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
0.30    0.28    106.0   s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)

Just for contrast, doesn't make much difference for the "late match, two characters" tests:

========== late match, two characters
0.44    0.58    76.2    ("AB"*300+"C").find("BC") (*1000)
0.57    0.48    120.2   ("AB"*300+"C").find("BC") (*1000)

0.59    0.73    80.5    ("AB"*300+"CA").find("CA") (*1000)
0.56    0.72    77.5    ("AB"*300+"CA").find("CA") (*1000)

0.55    0.49    112.5   "BC" in ("AB"*300+"C") (*1000)
0.66    0.37    177.7   "BC" in ("AB"*300+"C") (*1000)

0.45    0.58    76.5    ("AB"*300+"C").index("BC") (*1000)
0.57    0.49    116.5   ("AB"*300+"C").index("BC") (*1000)

0.61    0.62    98.6    ("AB"*300+"C").partition("BC") (*1000)
0.72    0.52    137.2   ("AB"*300+"C").partition("BC") (*1000)

0.62    0.64    96.4    ("C"+"AB"*300).rfind("CA") (*1000)
0.49    0.49    101.6   ("C"+"AB"*300).rfind("CA") (*1000)

0.57    0.65    87.5    ("BC"+"AB"*300).rfind("BC") (*1000)
0.51    0.57    89.3    ("BC"+"AB"*300).rfind("BC") (*1000)

0.62    0.64    96.5    ("C"+"AB"*300).rindex("CA") (*1000)
0.50    0.49    101.2   ("C"+"AB"*300).rindex("CA") (*1000)

0.68    0.69    99.0    ("C"+"AB"*300).rpartition("CA") (*1000)
0.61    0.54    113.5   ("C"+"AB"*300).rpartition("CA") (*1000)

0.82    0.60    137.8   ("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.63    0.57    112.0   ("C"+"AB"*300).rsplit("CA", 1) (*1000)

0.63    0.61    103.0   ("AB"*300+"C").split("BC", 1) (*1000)
0.74    0.54    138.2   ("AB"*300+"C").split("BC", 1) (*1000)
History
Date User Action Args
2020-10-17 20:51:12tim.peterssetrecipients: + tim.peters, gregory.p.smith, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, corona10, Dennis Sweeney, Zeturic
2020-10-17 20:51:12tim.peterssetmessageid: <1602967872.95.0.892700676627.issue41972@roundup.psfhosted.org>
2020-10-17 20:51:12tim.peterslinkissue41972 messages
2020-10-17 20:51:12tim.peterscreate