This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author James.Dominy
Recipients James.Dominy
Date 2014-02-26.11:59:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1393415960.86.0.411484359826.issue20781@psf.upfronthosting.co.za>
In-reply-to
Content
bz2.BZ2File does not decompress a file (see attached) correctly. This file can be decompressed and compressed via stadard unix tools (bzip2 and bunzip2) without change.

Consider ...

$ python
Python 2.7.6 (default, Dec  7 2013, 22:49:16) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bz2
>>> import hashlib
>>> len(bz2.BZ2File("example-file.csv.bz2", "r", 0).read())
900000
>>> hashlib.md5(bz2.BZ2File("example-file.csv.bz2", "r", 0).read()).hexdigest()
'e2d4ce212a040c879cb256f88c9faab9'
>>> len(bz2.BZ2File("example-file.csv.bz2", "rb", 0).read())
900000
>>> hashlib.md5(bz2.BZ2File("example-file.csv.bz2", "rb", 0).read()).hexdigest()
'e2d4ce212a040c879cb256f88c9faab9'
>>> 

It looks like bz2 is not dealing with the second block. This is not the first file I've come across that has this problem, and initially I thought it was the file not the module. I've attached a copy of the file.

I use gentoo on a 64bit intel core i5.
History
Date User Action Args
2014-02-26 11:59:20James.Dominysetrecipients: + James.Dominy
2014-02-26 11:59:20James.Dominysetmessageid: <1393415960.86.0.411484359826.issue20781@psf.upfronthosting.co.za>
2014-02-26 11:59:20James.Dominylinkissue20781 messages
2014-02-26 11:59:20James.Dominycreate