This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bz2 module doesn't write end-of-stream marker
Type: behavior Stage: resolved
Components: Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Dobatymo, Jeffrey.Kintscher, matrixise, serhiy.storchaka, twouters
Priority: normal Keywords:

Created on 2019-05-22 03:36 by Dobatymo, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg343143 - (view) Author: Dobatymo (Dobatymo) Date: 2019-05-22 03:36
According to https://en.wikipedia.org/wiki/Bzip2 the reference implementation of bzip2 writes an end-of-stream marker (also called stream footer) with a magic number and a stream checksum to the file.

Python does not do so. The files can still be read by all bzip2 compatible software I tried. For completeness and better error detection however, writing this marker (optionally maybe) would be useful.
msg343157 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-05-22 08:41
Thank you, I have added Serhiy and Thomas for this issue but I have removed the versions 3.7, 3.6, 3.5 and 2.7.

Maybe it's a new feature for the next versions of Python.
msg344342 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2019-06-03 00:04
The bz2 library and the bzip2 utility generate identical files. For example:

>>> import bz2
>>> with bz2.open('bz2test.txt.bz2', mode='wb') as f:
...     f.write(b'foobar')
... 
6

generates this file:

$ hexdump bz2test.txt.bz2
0000000 42 5a 68 39 31 41 59 26 53 59 52 c0 3d c1 00 00
0000010 01 01 80 31 00 90 00 20 00 21 83 41 9a 09 88 1c
0000020 5d c9 14 e1 42 41 4b 00 f7 04                  
000002a

and

$ echo -n 'foobar' | bzip2 -c > bzip2test.txt.bz2

generates this file:

$ hexdump bzip2test.txt.bz2 
0000000 42 5a 68 39 31 41 59 26 53 59 52 c0 3d c1 00 00
0000010 01 01 80 31 00 90 00 20 00 21 83 41 9a 09 88 1c
0000020 5d c9 14 e1 42 41 4b 00 f7 04                  
000002a

The StreamFooter is there, it is just difficult to see because it isn't byte-aligned. It starts at byte 0x20 bit 6.  If you take 0x5dc914e14241 and shift it right two bits, you get the StreamFooter FooterMagic value: 0x177245385090. It is followed by a 32-bit CRC and two zero-bits of padding to byte-align the end of the file.
msg344344 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2019-06-03 00:14
Sorry, the StreamFooter starts at byte 0x1f bit 1, not 0x20 bit 6.
msg344376 - (view) Author: Dobatymo (Dobatymo) Date: 2019-06-03 03:52
Sorry, I forgot about the bit alignment when I was checking for the footer. This can be closed then.
History
Date User Action Args
2022-04-11 14:59:15adminsetgithub: 81186
2019-06-05 09:46:58Dobatymosetstatus: open -> closed
resolution: works for me
stage: resolved
2019-06-03 03:52:54Dobatymosetmessages: + msg344376
2019-06-03 00:14:50Jeffrey.Kintschersetmessages: + msg344344
2019-06-03 00:05:00Jeffrey.Kintschersetmessages: + msg344342
2019-05-28 07:49:04Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-05-22 08:41:15matrixisesetnosy: + serhiy.storchaka, twouters, matrixise

messages: + msg343157
versions: - Python 2.7, Python 3.5, Python 3.6, Python 3.7
2019-05-22 03:36:08Dobatymocreate