Message221784
This code
import _lzma
with open('22h_ticks_bad.bi5', 'rb') as f:
infile = f.read()
for i in range(8191, 8195):
decompressor = _lzma.LZMADecompressor()
first_out = decompressor.decompress(infile[:i])
first_len = len(first_out)
last_out = decompressor.decompress(infile[i:])
last_len = len(last_out)
print(i, first_len, first_len + last_len, decompressor.eof)
prints this
8191 36243 45480 True
8192 36251 45473 False
8193 36253 45475 False
8194 36260 45480 True
It seems to me that this is a subtle bug in liblzma; if the input stream to the incremental decompressor is broken at the wrong place, the internal state of the decompressor is corrupted. For this particular file, it happens when the break occurs after reading 8192 or 8193 bytes, and lzma.py happens to use a buffer of 8192 bytes. There is nothing wrong with the compressed file, since lzma.py decompresses it correctly if the buffer size is set to almost any other value. |
|
Date |
User |
Action |
Args |
2014-06-28 13:05:57 | Esa.Peuha | set | recipients:
+ Esa.Peuha, nadeem.vawda, serhiy.storchaka, josh.r, vnummela |
2014-06-28 13:05:57 | Esa.Peuha | set | messageid: <1403960757.78.0.184680440641.issue21872@psf.upfronthosting.co.za> |
2014-06-28 13:05:57 | Esa.Peuha | link | issue21872 messages |
2014-06-28 13:05:57 | Esa.Peuha | create | |
|