classification
Title: bz2.BZ2File doesn't support multiple streams
Type: feature request Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: akuchling, dbonner, niemeyer, pitrou, r.david.murray, therve, thomas.lee (7)
Priority: normal Keywords patch

Created on 2007-12-14 09:20 by therve, last changed 2009-10-26 18:53 by pitrou.

Files
File name Uploaded Description Edit Remove
bz2_patch.tar.bz2 dbonner, 2009-09-29 20:07 issue1625 - bz2 multiple stream patch v1
py3k_bz2.patch dbonner, 2009-10-01 14:21 issue1625 - bz2 multiple stream patch v2
py3k_bz2.patch dbonner, 2009-10-07 20:36 issue1625 - bz2 multiple stream patch v3
py3k_bz2.patch dbonner, 2009-10-21 20:09 issue1625 - bz2 multiple stream patch v4
Messages (17)
msg58619 - (view) Author: Thomas Herve (therve) Date: 2007-12-14 09:20
The BZ2File class only supports one stream per file. It possible to have
multiple streams concatenated in one file, it the resulting data should
be the concatenation of all the streams. It's what the bunzip2 program
produces, for example. It's also supported by the gzip module.

Once this done, this would add the ability to open a file for appending,
by adding another stream to the file.

I'll probably try to do this, but the fact it's done in C (unlike gzip)
makes it harder, so if someone beats me to it, etc.
msg59897 - (view) Author: Thomas Lee (thomas.lee) Date: 2008-01-14 13:31
If you're referring to an 'append' mode for bz2file objects, it may be a
limitation of the underlying library: my version of bzlib.h only
provides BZ2_bzWriteOpen and BZ2_bzReadOpen - it's not immediately clear
how you would open a BZ2File in append mode looking at this API.

It may be possible to implement r/w/a using the lower-level
bzCompress/bzDecompress functions, but I doubt that's going to happen
unless somebody (such as yourself? :)) cares deeply about this.
msg60236 - (view) Author: A.M. Kuchling (akuchling) Date: 2008-01-19 22:00
Like gzip, you can concatenate two bzip2 files:

bzip2 -c /etc/passwd >/tmp/pass.bz2

bzip2 -c /etc/passwd >>/tmp/pass.bz2

bunzip2 will output both parts, generating two copies of the file.

So nothing needs to be done on compression, but uncompression needs to
look for another chunk of compressed data after finishing one chunk.
msg60268 - (view) Author: Thomas Herve (therve) Date: 2008-01-20 09:12
The gzip module supports reopening an existing file to add another
stream. I think the bz2 module should not the same.
msg93323 - (view) Author: David Bonner (dbonner) Date: 2009-09-29 19:36
I've got a patch that fixes this.  It allows BZ2File to read
multi-stream files as generated by pbzip2, allows BZ2File to open files
in append mode, and also updates bz2.decompress to allow it to handle
multi-stream chunks of data.

We originally wrote it against 2.5, but I've updated the patch to py3k
trunk, and attached it here.  If there's interest in a patch against 2.7
trunk, please let me know.
msg93326 - (view) Author: David Bonner (dbonner) Date: 2009-09-29 20:07
sorry, the previous patch was from an old version.  attaching the
correct version now.  apologies for the noise.
msg93405 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-10-01 13:42
Some notes about posting patches:
- you should post the patch alone, not in an archive
- generally you should post patches against the 2.7 trunk, we take care
of merging them to py3k ourselves (but in this case the difference
should be minimal anyway)
- I'm not sure it's ok to add legal boilerplate at the top of files, we
never do that usually (and if everyone did it would become unreadable).
Does your company require you to do so?

I'll look at the patch itself another day, I don't have the time right
now. But thanks for posting it!
msg93407 - (view) Author: David Bonner (dbonner) Date: 2009-10-01 14:21
Thanks for the reply.

My company's legal dept. told me that we needed to put the boilerplate
into the files as part of releasing it under the apache license.  I used
a tarball because they also recommended including a full copy of the
license with the patch.

I'm reattaching just the patch to the bug now.  I'll check with legal
and see if they'd have a problem with removing the boilerplate.
msg93408 - (view) Author: R. David Murray (r.david.murray) Date: 2009-10-01 14:29
If the patch is substantial enough that legal boilerplate is even an
issue, then I'm pretty sure a contributor agreement will be required for
patch acceptance, at which point I think the boilerplate won't be
needed.  The Apache license is certainly acceptable.  I'm obviously not
the authority on this, though.  That would be van Lindburg.
msg93721 - (view) Author: David Bonner (dbonner) Date: 2009-10-07 20:36
I can remove the boilerplate from the code as long as I add the
following to the submittal:

VMware, Inc. is providing this bz2 module patch to you under the terms
of the Apache License 2.0 with the understanding that you plan to
re-license this under the terms and conditions of the Python License.
This patch is provided as is, with no warranties or support. VMware
disclaims all liability in connection with the use/inability to use this
patch. Any use of the attached is considered acceptance of the above.
msg93841 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-10-10 20:11
As far as I can tell, the patch looks mostly good.
I just wonder, in Util_HandleBZStreamEnd(), why you don't set self->mode
to MODE_CLOSED if BZ2_bzReadOpen() fails.

As a sidenote, the bz2 module implementation seems to have changed quite
a bit between trunk and py3k, so if you want it to be backported to
trunk (2.7), you'll have to provide a separate patch.
msg94316 - (view) Author: David Bonner (dbonner) Date: 2009-10-21 18:02
Hrm...yeah, I should probably be setting it to closed as soon as
BZ2_bzReadClose() returns, and then back to open once BZ2_bzReadOpen
succeeds.  Wasn't intentional...thanks for the catch.  You guys need a
new patch with that change in it?

I'll try and get a 2.7 patch done and uploaded in a day or two.
msg94318 - (view) Author: R. David Murray (r.david.murray) Date: 2009-10-21 18:28
A new patch will make it more likely that it will actually get applied :)

Thanks for your work on this.
msg94321 - (view) Author: David Bonner (dbonner) Date: 2009-10-21 20:09
Understandable.  New patch attached.
msg94434 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-10-24 18:44
I'm not comfortable with the following change (which appears twice in
the patch):

-			BZ2_bzReadClose(&bzerror, self->fp);
+			if (self->fp)
+				BZ2_bzReadClose(&bzerror, self->fp);
 			break;
 		case MODE_WRITE:
-			BZ2_bzWriteClose(&bzerror, self->fp,
-					 0, NULL, NULL);
+			if (self->fp)
+				BZ2_bzWriteClose(&bzerror, self->fp,
+						 0, NULL, NULL);


If you need to test for the file pointer, perhaps there's a logic flaw
in your patch. Also, it might be dangerous in write mode: could it occur
that the file isn't closed but the problem isn't reported?
msg94446 - (view) Author: David Bonner (dbonner) Date: 2009-10-25 03:37
That was mostly just out of paranoia, since the comments mentioned 
multiple calls to close being legal.  Looking at it again, that particular 
case isn't an issue, since we don't hit that call when the mode is 
MODE_CLOSED.  The testsuite runs happily with those changes reverted.  
Should I upload a new patch?
msg94499 - (view) Author: Antoine Pitrou (pitrou) Date: 2009-10-26 18:53
> That was mostly just out of paranoia, since the comments mentioned 
> multiple calls to close being legal.  Looking at it again, that particular 
> case isn't an issue, since we don't hit that call when the mode is 
> MODE_CLOSED.  The testsuite runs happily with those changes reverted.  
> Should I upload a new patch?

You don't need to, but on the other hand I forgot to ask you to update
the documentation :-) (see Doc/library/bz2.rst)
History
Date User Action Args
2009-10-26 18:53:02pitrousetmessages: + msg94499
2009-10-25 03:37:07dbonnersetmessages: + msg94446
2009-10-24 18:44:19pitrousetmessages: + msg94434
2009-10-21 20:09:10dbonnersetfiles: + py3k_bz2.patch

messages: + msg94321
2009-10-21 18:28:25r.david.murraysetmessages: + msg94318
2009-10-21 18:02:40dbonnersetmessages: + msg94316
2009-10-10 20:11:34pitrousetmessages: + msg93841
2009-10-07 20:36:51dbonnersetfiles: + py3k_bz2.patch

messages: + msg93721
2009-10-01 14:29:38r.david.murraysetnosy: + r.david.murray
messages: + msg93408
2009-10-01 14:21:18dbonnersetfiles: + py3k_bz2.patch
keywords: + patch
messages: + msg93407
2009-10-01 13:42:41pitrousetnosy: + pitrou

messages: + msg93405
versions: + Python 2.7, - Python 2.6, Python 2.5
2009-09-29 20:07:06dbonnersetfiles: + bz2_patch.tar.bz2

messages: + msg93326
2009-09-29 20:06:34dbonnersetfiles: - bz2_patch.tar.bz2
2009-09-29 19:36:03dbonnersetfiles: + bz2_patch.tar.bz2
versions: + Python 2.5, Python 3.2
nosy: + dbonner

messages: + msg93323
2008-03-18 16:55:02jafosetpriority: normal
assignee: niemeyer
nosy: + niemeyer
2008-01-20 09:12:38thervesetmessages: + msg60268
2008-01-19 22:00:08akuchlingsetnosy: + akuchling
messages: + msg60236
2008-01-14 13:31:59thomas.leesetnosy: + thomas.lee
messages: + msg59897
2007-12-14 09:20:30thervecreate