msg230576 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2014-11-04 05:00 |
Save space and reduce I/O time (reading and writing) by compressing the marshaled code in files.
In my code tree for Python 3, there was a nice space savings 19M to 7M. Here's some of the output from my test:
8792 -> 4629 ./Tools/scripts/__pycache__/reindent.cpython-35.pyc
1660 -> 1063 ./Tools/scripts/__pycache__/rgrep.cpython-35.pyc
1995 -> 1129 ./Tools/scripts/__pycache__/run_tests.cpython-35.pyc
1439 -> 973 ./Tools/scripts/__pycache__/serve.cpython-35.pyc
727 -> 498 ./Tools/scripts/__pycache__/suff.cpython-35.pyc
3240 -> 1808 ./Tools/scripts/__pycache__/svneol.cpython-35.pyc
74866 -> 23611 ./Tools/scripts/__pycache__/texi2html.cpython-35.pyc
5562 -> 2870 ./Tools/scripts/__pycache__/treesync.cpython-35.pyc
1492 -> 970 ./Tools/scripts/__pycache__/untabify.cpython-35.pyc
1414 -> 891 ./Tools/scripts/__pycache__/which.cpython-35.pyc
19627963 -> 6976410 Total
I haven't measured it yet, but I believe this will improve Python's start-up time (because fewer bytes get transferred from disk).
|
msg230581 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2014-11-04 06:16 |
Looking into this further, I suspect that the cleanest way to implement this would be to add a zlib compression and decompression using to the marshal.c (bumping the version number to 5).
|
msg230600 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-11-04 09:41 |
This is similar to the idea of loading the stdlib from a zip file (but less intrusive and more debugging-friendly). The time savings will depend on whether the filesystem cache is cold or hot. In the latter case, my intuition is that decompression will slow things down a bit :-)
Quick decompression benchmark on a popular stdlib module, and a fast CPU:
$ ./python -m timeit -s "import zlib; data = zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read())" "zlib.decompress(data)"
10000 loops, best of 3: 180 usec per loop
|
msg230607 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2014-11-04 10:25 |
On 04.11.2014 10:41, Antoine Pitrou wrote:
>
> Antoine Pitrou added the comment:
>
> This is similar to the idea of loading the stdlib from a zip file (but less intrusive and more debugging-friendly). The time savings will depend on whether the filesystem cache is cold or hot. In the latter case, my intuition is that decompression will slow things down a bit :-)
>
> Quick decompression benchmark on a popular stdlib module, and a fast CPU:
>
> $ ./python -m timeit -s "import zlib; data = zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read())" "zlib.decompress(data)"
> 10000 loops, best of 3: 180 usec per loop
zlib is rather slow when it comes to decompression. Something like
snappy or lz4 could work out, though:
https://code.google.com/p/snappy/
https://code.google.com/p/lz4/
Those were designed to be fast on decompression.
|
msg230610 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-11-04 10:39 |
Ok, comparison between zlib/snappy/lz4:
$ python3.4 -m timeit -s "import zlib; data = zlib.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); print(len(data))" "zlib.decompress(data)"
10000 loops, best of 3: 181 usec per loop
$ python3.4 -m timeit -s "import snappy; data = snappy.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); print(len(data))" "snappy.decompress(data)"
10000 loops, best of 3: 35 usec per loop
$ python3.4 -m timeit -s "import lz4; data = lz4.compress(open('Lib/__pycache__/threading.cpython-35.pyc', 'rb').read()); print(len(data))" "lz4.decompress(data)"
10000 loops, best of 3: 21.3 usec per loop
Compressed sizes for threading.cpython-35.pyc (the file used above):
- zlib: 14009 bytes
- snappy: 20573 bytes
- lz4: 21038 bytes
- uncompressed: 38973 bytes
Packages used:
https://pypi.python.org/pypi/lz4/0.7.0
https://pypi.python.org/pypi/python-snappy/0.5
|
msg230611 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-11-04 10:42 |
lz4 also has a "high compression" mode which improves the compression ratio (-> 17091 bytes compressed), for a similar decompression speed.
|
msg230615 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2014-11-04 10:49 |
Both lz4 and snappy are BSD-licensed, but snappy is written in C++.
|
msg230631 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2014-11-04 15:35 |
Just FYI, there can easily be added into importlib since it works through marshal's API to unmarshal the module's data. There is also two startup benchmarks in the benchmark suite to help measure possible performance gains/losses which should also ferret out if cache warmth will play a significant role in the performance impact.
|
msg230730 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-11-06 09:15 |
FWIW, I personally doubt this would actually reduce startup time. Disk I/O cost is in the first access, not in the transfer size (unless we're talking hundreds of megabytes). But in any case, someone interested has to do measurements :-)
|
msg230756 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2014-11-06 18:59 |
FWIW, LZ4HC compression sounds like an obvious choice for write-once-read-many data like .pyc files to me. Blosc shows that you can achieve a pretty major performance improvement just by stuffing more data into less space (although it does it for RAM and CPU cache, not disk). And even if it ends up not being substantially faster for the specific case of .pyc files, there is really no reason why they should take more space on disk than necessary, so it's a sure win in any case.
|
msg230838 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2014-11-08 06:32 |
> there is really no reason why they should take more space on disk
> than necessary, so it's a sure win in any case.
That is a nice summary.
> FWIW, LZ4HC compression sounds like an obvious choice for
> write-once-read-many data like .pyc files to me.
+1
|
msg230840 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-11-08 09:28 |
Compressing pyc files one by one wouldn't save much space because disk space is allocated by blocks (up to 32 KiB on FAT32). If the size of pyc file is less than block size, we will not gain anything. ZIP file has advantage due more compact packing of files. In additional it can has less access time due to less fragmentation. Unfortunately it doesn't support the LZ4 compression, but we can store LZ4 compressed files in ZIP file without additional compression.
Uncompressed TAR file has same advantages but needs longer initialization time (for building the index).
|
msg230842 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2014-11-08 10:08 |
On 08.11.2014 10:28, Serhiy Storchaka wrote:
> Compressing pyc files one by one wouldn't save much space because disk space is allocated by blocks (up to 32 KiB on FAT32). If the size of pyc file is less than block size, we will not gain anything. ZIP file has advantage due more compact packing of files. In additional it can has less access time due to less fragmentation. Unfortunately it doesn't support the LZ4 compression, but we can store LZ4 compressed files in ZIP file without additional compression.
>
> Uncompressed TAR file has same advantages but needs longer initialization time (for building the index).
The aim is to reduce file load time, not really to save disk space.
By having less data to read from the disk, it may be possible
to achieve a small startup speedup.
However, you're right in that using a single archive with many PYC files
would be more efficient, since it lowers the number of stat() calls.
The trick to store LZ4 compressed data in a ZIP file would enable this.
BTW: We could add optional LZ4 compression to the marshal format to
make all this work transparently and without having to change the
import mechanism itself:
We'd just need to add a new flag or type code indicating that the rest
of the stream is LZ4 compressed. The PYC writer could then enable this
flag or type code per default (or perhaps enabled via some env var od
command line flag) and everything would then just work with both
LZ4 compressed byte code as well as non-compressed byte code.
|
msg404622 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2021-10-21 16:53 |
The space savings are nice, but I doubt that it will matter for startup time -- startup is most relevant in situations where it's *hot* (e.g. a shell script that repeatedly calls out to utilities written in Python).
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:09 | admin | set | github: 66978 |
2021-10-26 04:56:26 | barry | set | nosy:
+ barry
|
2021-10-23 16:55:49 | FFY00 | set | nosy:
+ FFY00
|
2021-10-21 16:53:10 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg404622
|
2020-03-18 18:02:23 | brett.cannon | set | nosy:
- brett.cannon
|
2014-11-08 10:08:17 | lemburg | set | messages:
+ msg230842 |
2014-11-08 09:28:55 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg230840
|
2014-11-08 06:32:04 | rhettinger | set | messages:
+ msg230838 |
2014-11-06 18:59:59 | scoder | set | nosy:
+ scoder messages:
+ msg230756
|
2014-11-06 09:15:20 | pitrou | set | messages:
+ msg230730 |
2014-11-04 15:59:21 | christian.heimes | set | nosy:
+ christian.heimes
|
2014-11-04 15:35:47 | brett.cannon | set | messages:
+ msg230631 |
2014-11-04 11:58:01 | Arfrever | set | nosy:
+ Arfrever
|
2014-11-04 10:49:49 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg230615
|
2014-11-04 10:42:44 | pitrou | set | messages:
+ msg230611 |
2014-11-04 10:39:59 | pitrou | set | messages:
+ msg230610 |
2014-11-04 10:25:08 | lemburg | set | nosy:
+ lemburg messages:
+ msg230607
|
2014-11-04 09:41:05 | pitrou | set | nosy:
+ tim.peters messages:
+ msg230600
|
2014-11-04 06:16:55 | rhettinger | set | messages:
- msg230580 |
2014-11-04 06:16:46 | rhettinger | set | messages:
+ msg230581 |
2014-11-04 06:09:19 | rhettinger | set | nosy:
+ brett.cannon, pitrou messages:
+ msg230580 components:
+ Interpreter Core
|
2014-11-04 05:00:45 | rhettinger | create | |