New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for bzip2 compression to the zipfile module #58579
Comments
ZIP File Format Specification (http://www.pkware.com/documents/casestudies/APPNOTE.TXT) supports bzip2 compression since at least 2003. Since bzip2 contained in Python standart library, it would be nice to add support for these method in zipfile. This will allow to process more foreign zip files and create more compact distributives. The proposed patch adds new method ZIP_BZIP2, which is automatically detecting when unpacking and that can be used for packing. |
Can you please submit a contributor form? http://python.org/psf/contrib/contrib-form/ |
The patch looks good. Can you also provide a test case? |
I am working on this. Should I add tests to test_zipfile.py or create new test_zipfile_bzip2.py? It would add a note that the bzip2 compression can understand not all programs (and do not understand the older versions of Python), but understands the Info-Unzip? My English is not enough for the documentation. |
Please add it to test_zipfile. As for the documentation, I propose the wording "bzip2 compression was added to the zip file format in 2001. However, even more recent tools (including older Python releases) may not support it, causing either refusal to process the zip file altogether, or faiilure to extract individual files." I'm not a native speaker of English, either. Feel free to put things through Google translate; some native speaker will pick up the text and correct it. |
Thanks to the tests, I found the error. Since the bzip2 is block algorithm, decompressor need to eat a certain amount of data, so it began to return data. Now when reading small chunks turns out premature end of data. I'm working on a fix. |
All errors are fixed. All tests are passed. Unfortunately, the patch was more than expected. This is necessary for correct and effective work with large bzip2 buffers (for other codecs can also be a profit). |
[Adding Alan McIntyre, who is listed as zipfile's maintainer.] I haven't yet had a chance to properly familiarize myself with the Martin:
How about this? "The zip format specification has included support for bzip2 compression |
You made a mess. The existing code uses
Thank you. Can you offer the variant with including both bzip2 and lzma |
My mistake; I confused the bodies of read() and read1().
"The zip format specification has included support for bzip2 compression |
Fixed regeression in decompression. Nadeem Vawda, we both were wrong. |
What's the status of your contrib form? |
Perhaps because your system's memory allocator is extremely good (or buf is always very small), but b''.join() is far more robust. |
Oops. I put this off for a detailed study and forgotten. I will send the form, as only get to the printer and the scanner. |
I thought, that it was in special optimization, mentioned in the In this particular case, the bytes appending is performed only once (and |
New changeset 028e8e0b03e8 by Martin v. Löwis in branch 'default': |
Thanks for the patch! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: