Message 212342 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nadeem.vawda
Recipients	James.Dominy, nadeem.vawda, serhiy.storchaka
Date	2014-02-27.09:24:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1393493046.53.0.2553506959.issue20781@psf.upfronthosting.co.za>
In-reply-to

Content
> How does one create a multi-stream bzip2 file in the first place? If you didn't do so deliberately, I would guess that you used a parallel compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work by splitting the input into chunks, compressing each chunk as a separate stream, and then concatenating these streams afterward. Another possibility is that you just concatenated two existing bz2 files, e.g.: $ cat first.bz2 second.bz2 >multi.bz2 > And how do I tell it's multi-stream. I don't know of any pre-existing tools to do this, but you can write a script for it yourself, by feeding the file's data through a BZ2Decompressor. When the decompress() method raises EOFError, you're at the end of the first stream. If the decompressor's unused_data attribute is non-empty, or there is data that has not yet been read from the input file, then it is either (a) a multi-stream bz2 file or (b) a bz2 file with other metadata tacked on to the end. To distinguish between cases (a) and (b), take unused_data + rest_of_input_file and feed it into a new BZ2Decompressor. If don't get an IOError, then you've got a multi-stream bz2 file. (If you do get an IOError, then that's case (b) - someone's appended non-bz2 data to the end of a bz2 file. For example, Gentoo and Sabayon Linux packages are bz2 files with package metadata appended, according to issue 19839.)

> How does one create a multi-stream bzip2 file in the first place?

If you didn't do so deliberately, I would guess that you used a parallel
compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work
by splitting the input into chunks, compressing each chunk as a separate stream,
and then concatenating these streams afterward.

Another possibility is that you just concatenated two existing bz2 files, e.g.:

    $ cat first.bz2 second.bz2 >multi.bz2


> And how do I tell it's multi-stream.

I don't know of any pre-existing tools to do this, but you can write a script
for it yourself, by feeding the file's data through a BZ2Decompressor. When the
decompress() method raises EOFError, you're at the end of the first stream. If
the decompressor's unused_data attribute is non-empty, or there is data that has
not yet been read from the input file, then it is either (a) a multi-stream bz2
file or (b) a bz2 file with other metadata tacked on to the end.

To distinguish between cases (a) and (b), take unused_data + rest_of_input_file
and feed it into a new BZ2Decompressor. If don't get an IOError, then you've got
a multi-stream bz2 file.

(If you *do* get an IOError, then that's case (b) - someone's appended non-bz2
 data to the end of a bz2 file. For example, Gentoo and Sabayon Linux packages
 are bz2 files with package metadata appended, according to issue 19839.)

History
Date	User	Action	Args
2014-02-27 09:24:06	nadeem.vawda	set	recipients: + nadeem.vawda, serhiy.storchaka, James.Dominy
2014-02-27 09:24:06	nadeem.vawda	set	messageid: <1393493046.53.0.2553506959.issue20781@psf.upfronthosting.co.za>
2014-02-27 09:24:06	nadeem.vawda	link	issue20781 messages
2014-02-27 09:24:03	nadeem.vawda	create