Author nadeem.vawda
Recipients Arfrever, christian.heimes, eric.araujo, nadeem.vawda, pitrou, serhiy.storchaka
Date 2012-12-09.13:11:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1355058715.45.0.527712147983.issue15955@psf.upfronthosting.co.za>
In-reply-to
Content
>>     # Using zlib's interface
>>     while not d.eof:
>>         compressed = d.unconsumed_tail or f.read(8192)
>>         if not compressed:
>>             raise ValueError('End-of-stream marker not found')
>>         output = d.decompress(compressed, 8192)
>>         # <process output>
>
> This is not usable with bzip2. Bzip2 uses large block size and unconsumed_tail 
> can be non empty but decompress() will return b''. With zlib you possible can 
> see the same effect on some input when read by one byte.

I don't see how this is a problem. If (for some strange reason) the
application-specific processing code can't handle empty blocks properly, you can
just stick "if not output: continue" before it.


> Actually it should be:
>
>     # Using zlib's interface
>     while not d.eof:
>         output = d.decompress(d.unconsumed_tail, 8192)
>         while not output and not d.eof:
>             compressed = f.read(8192)
>             if not compressed:
>                 raise ValueError('End-of-stream marker not found')
>             output = d.decompress(d.unconsumed_tail + compressed, 8192)
>         # <process output>
>
> Note that you should use d.unconsumed_tail + compressed as input, and therefore
> do an unnecessary copy of the data.

Why is this necessary? If unconsumed_tail is b'', then there's no need to
prepend it (and the concatenation would be a no-op anyway). If unconsumed_tail
does contain data, then we don't need to read additional compressed data from
the file until we've finished decompressing the data we already have.


> Without explicit unconsumed_tail you can write input data in the internal
> mutable buffer, it will be more effective for large buffer (handreds of KB)
> and small input chunks (several KB).

Are you proposing that the decompressor object maintain its own buffer, and
copy the input data into it before passing it to the decompression library?
Doesn't that just duplicate work that the library is already doing for us?
History
Date User Action Args
2012-12-09 13:11:55nadeem.vawdasetrecipients: + nadeem.vawda, pitrou, christian.heimes, eric.araujo, Arfrever, serhiy.storchaka
2012-12-09 13:11:55nadeem.vawdasetmessageid: <1355058715.45.0.527712147983.issue15955@psf.upfronthosting.co.za>
2012-12-09 13:11:55nadeem.vawdalinkissue15955 messages
2012-12-09 13:11:54nadeem.vawdacreate