Author tim.peters
Recipients Dennis Sweeney, Zeturic, ammar2, corona10, gregory.p.smith, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-16.21:08:45
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602882525.7.0.0118454158623.issue41972@roundup.psfhosted.org>
In-reply-to
Content
I don't think Rabin-Karp is worth trying here. The PR is provably worst-case linear time, and the constant factor is already reasonably small. Its only real weakness I can see is that it can be significantly (but seemingly not dramatically) slower than the status quo, in what are exceptionally good cases for the status quo, and perhaps most often due to the increased preprocessing cost of the new code (which is relatively most damaging if the needle is found early in the haystack - in which cases preprocessing can be close to a pure waste of time).

The status quo and the PR both enjoy sublinear (O(len(haystack) / len(needle)) cases too, but R-K doesn't.

Where R-K is a "go to" choice is when an app needs to search the same large text for several strings (Wikipedia has a decent description of R-K can be adapted to that) - but Python has no string API that does such a thing (closest is joining the search-for strings via "|" for a regexp search).
History
Date User Action Args
2020-10-16 21:08:45tim.peterssetrecipients: + tim.peters, gregory.p.smith, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, corona10, Dennis Sweeney, Zeturic
2020-10-16 21:08:45tim.peterssetmessageid: <1602882525.7.0.0118454158623.issue41972@roundup.psfhosted.org>
2020-10-16 21:08:45tim.peterslinkissue41972 messages
2020-10-16 21:08:45tim.peterscreate