Author Esa.Peuha
Recipients Esa.Peuha, josh.r, nadeem.vawda, serhiy.storchaka, vnummela
Date 2014-06-28.13:05:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1403960757.78.0.184680440641.issue21872@psf.upfronthosting.co.za>
In-reply-to
Content
This code

import _lzma
with open('22h_ticks_bad.bi5', 'rb') as f:
    infile = f.read()
for i in range(8191, 8195):
    decompressor = _lzma.LZMADecompressor()
    first_out = decompressor.decompress(infile[:i])
    first_len = len(first_out)
    last_out = decompressor.decompress(infile[i:])
    last_len = len(last_out)
    print(i, first_len, first_len + last_len, decompressor.eof)

prints this

8191 36243 45480 True
8192 36251 45473 False
8193 36253 45475 False
8194 36260 45480 True

It seems to me that this is a subtle bug in liblzma; if the input stream to the incremental decompressor is broken at the wrong place, the internal state of the decompressor is corrupted. For this particular file, it happens when the break occurs after reading 8192 or 8193 bytes, and lzma.py happens to use a buffer of 8192 bytes. There is nothing wrong with the compressed file, since lzma.py decompresses it correctly if the buffer size is set to almost any other value.
History
Date User Action Args
2014-06-28 13:05:57Esa.Peuhasetrecipients: + Esa.Peuha, nadeem.vawda, serhiy.storchaka, josh.r, vnummela
2014-06-28 13:05:57Esa.Peuhasetmessageid: <1403960757.78.0.184680440641.issue21872@psf.upfronthosting.co.za>
2014-06-28 13:05:57Esa.Peuhalinkissue21872 messages
2014-06-28 13:05:57Esa.Peuhacreate