This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Dennis Sweeney
Recipients Dennis Sweeney, Zeturic, ammar2, corona10, gregory.p.smith, gvanrossum, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-19.04:55:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1603083336.89.0.256887911904.issue41972@roundup.psfhosted.org>
In-reply-to
Content
> Dennis, I think that's expected, right? Two-way on its own can exploit nothing about individual characters - it only preprocesses the needle to break the possibility for quadratic-time behavior due to periods in the needle.

Yes, that's correct.

> It sounds like you switched the current code's Sunday-ish approach to a more Boyer-Moore-ish approach. That probably hurt, and especially for shorter needles (where Sunday can yield a maximum shift of len(needle)+1, not len(needle) - the "+1" is more significant the smaller len(needle)).

I'm definitely open to switching things up, but I'm not totally sure there would be a difference. In Sunday, the pattern is somehow scanned at each step, and then upon finding a mismatch, we look up in the table the first character outside the window. With the PR as written, when looking up in the table the last character of the window, we haven't actually produced a mismatch yet; the 'continue' statements jump ahead *until* the last characters match (mod 128), and then the two-way scanning begins, which determines how the window gets shifted after that. But after this two-way-determined shift, the last character in the new window immediately gets looked up again in the table at the top of the loop, effectively letting the two-way shift immediately be extended to a longer shift. I suppose that the line `j += i - suffix + 1;` (which is very often an increment by 1) could be changed to a table lookup of the character just beyond the window, but I don't quite understand why that would be better than incrementing j and then doing the table lookups at the top of the loop.
History
Date User Action Args
2020-10-19 04:55:36Dennis Sweeneysetrecipients: + Dennis Sweeney, gvanrossum, tim.peters, gregory.p.smith, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, corona10, Zeturic
2020-10-19 04:55:36Dennis Sweeneysetmessageid: <1603083336.89.0.256887911904.issue41972@roundup.psfhosted.org>
2020-10-19 04:55:36Dennis Sweeneylinkissue41972 messages
2020-10-19 04:55:36Dennis Sweeneycreate