Author Dennis Sweeney
Recipients Dennis Sweeney, Zeturic, ammar2, josh.r, pmpp, serhiy.storchaka, tim.peters, vstinner
Date 2020-10-08.11:06:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602155219.38.0.293310549875.issue41972@roundup.psfhosted.org>
In-reply-to
Content
Indeed, this is just a very unlucky case.

    >>> n = len(longer)
    >>> from collections import Counter
    >>> Counter(s[:n])
    Counter({0: 9056995, 255: 6346813})
    >>> s[n-30:n+30].replace(b'\x00', b'.').replace(b'\xff', b'@')
    b'..............................@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'
    >>> Counter(s[n:])
    Counter({255: 18150624})


When checking "base", we're in this situation

    pattern:     @@@@@@@@
     string:     .........@@@@@@@@
    Algorithm says:     ^ these last characters don't match.
                         ^ this next character is not in the pattern
                         Therefore, skip ahead a bunch:

     pattern:              @@@@@@@@
      string:     .........@@@@@@@@

     This is a match!


Whereas when checking "longer", we're in this situation:

    pattern:     @@@@@@@@@
     string:     .........@@@@@@@@
    Algorithm says:      ^ these last characters don't match.
                          ^ this next character *is* in the pattern.
                          We can't jump forward.

     pattern:       @@@@@@@@
      string:     .........@@@@@@@@

     Start comparing at every single alignment...


I'm attaching reproducer.py, which replicates this from scratch without loading data from a file.
History
Date User Action Args
2020-10-08 11:06:59Dennis Sweeneysetrecipients: + Dennis Sweeney, tim.peters, vstinner, pmpp, serhiy.storchaka, josh.r, ammar2, Zeturic
2020-10-08 11:06:59Dennis Sweeneysetmessageid: <1602155219.38.0.293310549875.issue41972@roundup.psfhosted.org>
2020-10-08 11:06:59Dennis Sweeneylinkissue41972 messages
2020-10-08 11:06:59Dennis Sweeneycreate