Author Dennis Sweeney
Recipients Dennis Sweeney, Zeturic, ammar2, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-10.08:25:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602318319.88.0.282647736333.issue41972@roundup.psfhosted.org>
In-reply-to
Content
I agree that skip could could do 1 better.

---

> I don't know whether it "should be" applied

I don't think I'm convinced: the second check fixes only the very specific case when s[len(p):].startswith(p).
Perturbations of reproducer.py are still very slow with the patch:

    - pattern2 = b"B" * BIG
    + pattern2 = b"B" * (BIG + 10)

In fact, it's even slower than before due to the double-checking.

---

I'd be curious how something like Crochemore and Perrin's "two-way" algorithm would do (constant space, compares < 2n-m):

    http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
    https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm

Apparently it's been used as glibc's memmem() since 2008. There, it additionally computes a table, but only for long needles where it really pays off:

    https://code.woboq.org/userspace/glibc/string/str-two-way.h.html
    https://code.woboq.org/userspace/glibc/string/memmem.c.html

Although there certainly seems to be a complexity rabbithole here that would be easy to over-do.
I might look more into this.
History
Date User Action Args
2020-10-10 08:25:19Dennis Sweeneysetrecipients: + Dennis Sweeney, tim.peters, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, Zeturic
2020-10-10 08:25:19Dennis Sweeneysetmessageid: <1602318319.88.0.282647736333.issue41972@roundup.psfhosted.org>
2020-10-10 08:25:19Dennis Sweeneylinkissue41972 messages
2020-10-10 08:25:19Dennis Sweeneycreate