Title: Expand zipimport to include other compression methods
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, eric.snow, gregory.p.smith, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, superluser, yan12125
Priority: normal Keywords:

Created on 2013-01-20 18:41 by rhettinger, last changed 2020-03-06 20:01 by brett.cannon.

Messages (11)
msg180307 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-20 18:41
Only a little of the existing logic is tied to the zipfile format.  Consider adding support for xz, tar, tar.gz, tar.bz2, etc.

In particular, xz has better compression, resulting in both space savings and faster load times.
msg180310 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-20 20:19
tar.* is not a good choice because it doesn't allow random access. Bare tar better than zip only in case when you need to save additional file attributes (Unix file access mode, times, owner, group, links). ZIP format supports all this too, but not zipfile module yet.

Adding bz2 or lzma compression to ZIP file shouldn't be too hard.
msg180311 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-20 20:32
Here are some tests.

time 7z a -tzip -mx=0 $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mx=9 $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma $(find Lib -type f -name '*.py') >/dev/null
time 7z a -tzip -mm=lzma -mx=9 $(find Lib -type f -name '*.py') >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t python-lzma >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
wc -c python*.zip


             pack* unpack   size
             time   time    (MB)
store         0.5    0.2   19.42
deflate         6    0.4    4.59
deflate-max    40    0.4    4.52
bzip2           6    2.1    4.45
bzip2-max      79    2.0    4.39
lzma           37    0.7    4.42
lzma-max       62    0.7    4.39

*) For pack time I take user time because 7-zip well parallelize deflate and bzip2 compression.

As you can see, a size difference between maximal compression with different methods only 3%. lzma decompress almost twice slower then deflate, and bzip2 decompress 5 times slower. Python files are too small to get benefit from advanced compression.
msg180313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-20 20:54
> Here are some tests.

I think you want to put pyc files in the zip file as well.
msg180314 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-20 21:09
xz will likely be the best win -- it is purported to compress smaller than bz2 while retaining the decompression speed of zip.

As Antoine says, the usual practice is to add py, pyc, and pyo files to the compressed library; otherwise, there is an added cost with Python tries to write a missing pyc/pyo file.
msg180323 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-20 21:55

./python -m compileall $(find Lib -type f -name '*.py')
./python -O -m compileall $(find Lib -type f -name '*.py')


FILES="$(find Lib -name '*.py' -o -name '*.py[co]')"
time 7z a -tzip -mx=0 $FILES >/dev/null
time 7z a -tzip $FILES >/dev/null
time 7z a -tzip -mx=9 $FILES >/dev/null
time 7z a -tzip -mm=bzip2 $FILES >/dev/null
time 7z a -tzip -mm=bzip2 -mx=9 $FILES >/dev/null
time 7z a -tzip -mm=lzma $FILES >/dev/null
time 7z a -tzip -mm=lzma -mx=9 $FILES >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
time 7z t >/dev/null
wc -c python*.zip


             pack  unpack   size
             time   time    (MB)
store         1.6    0.5    65.4
deflate        19    0.9    17.5
deflate-max   134    0.9    17.2
bzip2          21    4.2    16.5
bzip2-max     294    4.1    16.3
lzma          120    2.3    15.9
lzma-max      204    2.3    15.8

All numbers are about 3x larger. lzma-max is 8% less than deflate-max but 2.5 times slower. Bzip2 is out of the game.
msg180324 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-20 21:58
Agreed it doesn't look very promising.
msg180347 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-01-21 18:00
So this seems like a confluence of both supporting compressed files for loading source code as well as supporting new archive formats (e.g. xz vs. tar); zip just happens to do both implicitly. And there is also the question of if you explicitly plan to do this in C code or in pure Python as I plan to introduce a pure Python version of zipimport into importlib for 3.4 so that it can use zipfile directly and thus all of its full support of zipfile abilities.

And there doesn't have to be any performance cost in trying to write bytecode files; it's very simple to have a loader which simply skips that step entirely.
msg220589 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-06-14 22:19
related: issue #17630 and issue #5950
msg267527 - (view) Author: Chih-Hsuan Yen (yan12125) * Date: 2016-06-06 12:58
+1 for that. I like XZ support so that our application size can be reduced.
msg325729 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-19 07:53
zipimport has been rewritten in pure Python (issue25711). Now it is easier to add support of other compression methods. Although I don't think that reducing the size by 3-8% is worth complicating the code.

If you still need this, I think that the simplest way is importing the zipfile module and monkey patching the simple ZIP file implementation in the zipimport module with zipfile-based implementation. This can be made only after importing zipfile itself, i.e. in case of zipping the stdlib, the zipfile module and its dependencies should be stored uncompressed or with the deflate compression.
Date User Action Args
2020-03-06 20:01:35brett.cannonsetnosy: - brett.cannon
2018-09-19 07:53:50serhiy.storchakasetmessages: + msg325729
versions: + Python 3.8, - Python 3.6
2016-06-06 12:58:29yan12125setnosy: + yan12125
messages: + msg267527
2015-08-05 15:58:39eric.snowsetnosy: + gregory.p.smith, superluser

versions: + Python 3.6, - Python 3.4
2014-06-14 22:19:35eric.snowsetnosy: + eric.snow
messages: + msg220589
2014-06-14 08:47:51serhiy.storchakalinkissue21751 superseder
2013-01-21 18:00:53brett.cannonsetnosy: + brett.cannon
messages: + msg180347
2013-01-20 21:58:08pitrousetmessages: + msg180324
2013-01-20 21:55:39serhiy.storchakasetmessages: + msg180323
2013-01-20 21:09:12rhettingersetmessages: + msg180314
2013-01-20 20:54:26pitrousetnosy: + pitrou
messages: + msg180313
2013-01-20 20:32:22serhiy.storchakasetmessages: + msg180311
2013-01-20 20:19:58serhiy.storchakasetnosy: + serhiy.storchaka, nadeem.vawda

messages: + msg180310
stage: needs patch
2013-01-20 18:45:44brian.curtinsetnosy: + brian.curtin
components: + Library (Lib)
2013-01-20 18:41:42rhettingercreate