Author lemburg
Recipients BTaskaya, Mark.Shannon, brandtbucher, brett.cannon, eric.snow, gvanrossum, larry, lemburg, nascheme, ronaldoussoren
Date 2021-08-28.11:17:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <b2926872-11ae-8811-9eea-b088b9bfe81d@egenix.com>
In-reply-to <1630123603.14.0.639670821622.issue45020@roundup.psfhosted.org>
Content
On 28.08.2021 06:06, Guido van Rossum wrote:
> 
>> With that in place, it'd be great to pre-cache all the .py files automatically read in at startup.
> 
> *All* the .py files? I think the binary bloat cause by deep-freezing the entire stdlib would be excessive. In fact, Eric's approach freezes everything in the encodings package, which turns out to be a lot of files and a lot of code (lots of simple data tables expressed in code), and I found that for basic startup time, it's best not to deep-freeze the encodings module except for __init__.py, aliases.py and utf_8.py.

Eric's approach, as I understand it, is pretty much what PyRun does.
It freezes almost the entire stdlib. The main aim was to save space
and create a Python runtime with very few files for easy installation and
shipment of products written in Python.

For Python 3.8 (I haven't ported it to more recent Python versions yet),
the uncompressed stripped binary is 15MB. UPX compressed, it's only 5MB:

-rwxr-xr-x 1 lemburg lemburg  15M May 19 15:26 pyrun3.8
-rwxr-xr-x 1 lemburg lemburg  32M Aug 26  2020 pyrun3.8-debug
-rwxr-xr-x 1 lemburg lemburg 5.0M May 19 15:26 pyrun3.8-upx

There's no bloat, since you don't need the .py/.pyc files for the stdlib
anymore. In fact, you save quite a bit of disk space compared to a
full Python installation and additionally benefit from the memory
mapping the OS does for sharing access to the marshal'ed byte code
between processes.

That said, some things don't work with such an approach, e.g.
a few packages include additional data files which they expect to
find on disk. Since those are not available anymore, they fail.

For PyRun I have patched some of those packages to include the
data in form of Python modules instead, so that it gets frozen
as well, e.g. the Python grammar files.

Whether this is a good approach for Python in general is a different
question, though. PyRun is created on top of the existing released
Python distribution, so it doesn't optimize for being able to
work with the frozen code. In fact, early versions did not
even have a REPL, since the main point was to run a
single released app.
History
Date User Action Args
2021-08-28 11:17:45lemburgsetrecipients: + lemburg, gvanrossum, brett.cannon, nascheme, ronaldoussoren, larry, Mark.Shannon, eric.snow, brandtbucher, BTaskaya
2021-08-28 11:17:45lemburglinkissue45020 messages
2021-08-28 11:17:44lemburgcreate