Author Zeturic
Recipients Zeturic
Date 2020-10-07.23:31:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602113519.53.0.297229304628.issue41972@roundup.psfhosted.org>
In-reply-to
Content
Sorry for the vague title. I'm not sure how to succinctly describe this issue.

The following code:

```
with open("data.bin", "rb") as f:
    data = f.read()

base = 15403807 * b'\xff'
longer = base + b'\xff'

print(data.find(base))
print(data.find(longer))
```

Always hangs on the second call to find.

It might complete eventually, but I've left it running and never seen it do so. Because of the structure of data.bin, it should find the same position as the first call to find.

The first call to find completes and prints near instantly, which makes the pathological performance of the second (which is only searching for one b"\xff" more than the first) even more mystifying.

I attempted to upload the data.bin file I was working with as an attachment here, but it failed multiple times. I assume it's too large for an attachment; it's a 32MiB file consisting only of 00 bytes and FF bytes.

Since I couldn't attach it, I uploaded it to a gist. I hope that's okay.

https://gist.github.com/Zeturic/7d0480a94352968c1fe92aa62e8adeaf

I wasn't able to trigger the pathological runtime behavior with other sequences of bytes, which is why I uploaded it in the first place. For example, if it is randomly generated, it doesn't trigger it.

I've verified that this happens on multiple versions of CPython (as well as PyPy) and on multiple computers / operating systems.
History
Date User Action Args
2020-10-07 23:31:59Zeturicsetrecipients: + Zeturic
2020-10-07 23:31:59Zeturicsetmessageid: <1602113519.53.0.297229304628.issue41972@roundup.psfhosted.org>
2020-10-07 23:31:59Zeturiclinkissue41972 messages
2020-10-07 23:31:59Zeturiccreate