This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients Dennis Sweeney, Zeturic, ammar2, corona10, gregory.p.smith, gvanrossum, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-18.01:04:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602983069.42.0.896218504553.issue41972@roundup.psfhosted.org>
In-reply-to
Content
I don't think we have any idea how the OP stumbled into this. Looks like it "just happened".

The case you construted is quadratic-time, but not quite as bad:

BBBBBaaaaaBBBBB
BBBBBBBBBB

Fails at once, because 'a' doesn't match the trailing needle 'B'. The Bloom filter is useless because the next haystack 'B' _is_ in the needle.  So shift right 1:

BBBBBaaaaaBBBBB
xBBBBBBBBBB

Last character matches, so goes on to compare 4 Bs and an aB mismatch. 6 in all. Bloom filter useless again, and another shift by 1:

BBBBBaaaaaBBBBB
xxBBBBBBBBBB

Now there's 1 less compare, because only 3 Bs match.

And so on.  After the next shift, only 2 Bs match, and after the shift following that, only 1 B.  Getting to:

BBBBBaaaaaBBBBB
xxxxxBBBBBBBBBB

Now after matching the trailing Bs, aB mismatches at once.  And we're done, because another shift by 1 moves the end of the needle beyond the end of the haystack (although I bet the Bloom filter reads up the trailing NUL byte from the haystack and actually makes a "giant" shift).

You're right that two-way yawns at this ;-)  'B' * (K*2) is split into

u = "" # empty string!
v = 'B' * (K*2)

so only the "match the right half" part of the searching algorithm comes into play.

BBBBBaaaaaBBBBB
BBBBBBBBBB

first mismatches at the 6th character (1-based counting - at index 5), so it shifts right by 6:

BBBBBaaaaaBBBBB
xxxxxxBBBBBBBBBB

And it's done, because the needle has moved beyond the end of the haystack.

The brainpower that went into making this "look trivial" is quite impressive :-)
History
Date User Action Args
2020-10-18 01:04:29tim.peterssetrecipients: + tim.peters, gvanrossum, gregory.p.smith, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, corona10, Dennis Sweeney, Zeturic
2020-10-18 01:04:29tim.peterssetmessageid: <1602983069.42.0.896218504553.issue41972@roundup.psfhosted.org>
2020-10-18 01:04:29tim.peterslinkissue41972 messages
2020-10-18 01:04:29tim.peterscreate