This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients Dennis Sweeney, Zeturic, ammar2, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-09.14:56:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602255365.34.0.522020934485.issue41972@roundup.psfhosted.org>
In-reply-to
Content
The attached fastsearch.diff removes the speed hit in the original test case and in the constructed one.

I don't know whether it "should be" applied, and really can't make time to dig into it.

The rationale:  when the last characters of the pattern string and the current search window don't match, this is what happens:  if the "Bloom filter" can prove that the character one beyond the search window isn't in the pattern at all, the search window is advanced by len(pattern) + 1 positions. Else it's advanced only 1 position.

What changes here: if the Bloom filter thinks the character one beyond the search window may be in the pattern, this goes on to see whether the Bloom filter thinks the last character in the search window may be in the pattern (by case assumption, we already know it's not the _last_ character in the pattern). If not, the search window can be advanced by len(pattern) positions.

So it adds an extra check in the last-characters-don't-match and next-character-may-be-in-the-pattern case:  if the last character is not in the pattern, it's irrelevant that the next character may be - there's no possible overall match including the last character, so we can move the search window beyond it.

The question is whether this costs more than it saves overall.  It saves literally hours of CPU time for one search in this report's case ;-)  But worst-case time remains O(m*n) regardless, just somewhat harder to provoke.
History
Date User Action Args
2020-10-09 14:56:05tim.peterssetrecipients: + tim.peters, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, Dennis Sweeney, Zeturic
2020-10-09 14:56:05tim.peterssetmessageid: <1602255365.34.0.522020934485.issue41972@roundup.psfhosted.org>
2020-10-09 14:56:05tim.peterslinkissue41972 messages
2020-10-09 14:56:04tim.peterscreate