Message 400469 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	BTaskaya, Mark.Shannon, brandtbucher, brett.cannon, eric.snow, gvanrossum, larry, lemburg, nascheme, ronaldoussoren
Date	2021-08-28.11:17:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<b2926872-11ae-8811-9eea-b088b9bfe81d@egenix.com>
In-reply-to	<1630123603.14.0.639670821622.issue45020@roundup.psfhosted.org>

Content
On 28.08.2021 06:06, Guido van Rossum wrote: > >> With that in place, it'd be great to pre-cache all the .py files automatically read in at startup. > > All the .py files? I think the binary bloat cause by deep-freezing the entire stdlib would be excessive. In fact, Eric's approach freezes everything in the encodings package, which turns out to be a lot of files and a lot of code (lots of simple data tables expressed in code), and I found that for basic startup time, it's best not to deep-freeze the encodings module except for __init__.py, aliases.py and utf_8.py. Eric's approach, as I understand it, is pretty much what PyRun does. It freezes almost the entire stdlib. The main aim was to save space and create a Python runtime with very few files for easy installation and shipment of products written in Python. For Python 3.8 (I haven't ported it to more recent Python versions yet), the uncompressed stripped binary is 15MB. UPX compressed, it's only 5MB: -rwxr-xr-x 1 lemburg lemburg 15M May 19 15:26 pyrun3.8 -rwxr-xr-x 1 lemburg lemburg 32M Aug 26 2020 pyrun3.8-debug -rwxr-xr-x 1 lemburg lemburg 5.0M May 19 15:26 pyrun3.8-upx There's no bloat, since you don't need the .py/.pyc files for the stdlib anymore. In fact, you save quite a bit of disk space compared to a full Python installation and additionally benefit from the memory mapping the OS does for sharing access to the marshal'ed byte code between processes. That said, some things don't work with such an approach, e.g. a few packages include additional data files which they expect to find on disk. Since those are not available anymore, they fail. For PyRun I have patched some of those packages to include the data in form of Python modules instead, so that it gets frozen as well, e.g. the Python grammar files. Whether this is a good approach for Python in general is a different question, though. PyRun is created on top of the existing released Python distribution, so it doesn't optimize for being able to work with the frozen code. In fact, early versions did not even have a REPL, since the main point was to run a single released app.

On 28.08.2021 06:06, Guido van Rossum wrote:
> 
>> With that in place, it'd be great to pre-cache all the .py files automatically read in at startup.
> 
> *All* the .py files? I think the binary bloat cause by deep-freezing the entire stdlib would be excessive. In fact, Eric's approach freezes everything in the encodings package, which turns out to be a lot of files and a lot of code (lots of simple data tables expressed in code), and I found that for basic startup time, it's best not to deep-freeze the encodings module except for __init__.py, aliases.py and utf_8.py.

Eric's approach, as I understand it, is pretty much what PyRun does.
It freezes almost the entire stdlib. The main aim was to save space
and create a Python runtime with very few files for easy installation and
shipment of products written in Python.

For Python 3.8 (I haven't ported it to more recent Python versions yet),
the uncompressed stripped binary is 15MB. UPX compressed, it's only 5MB:

-rwxr-xr-x 1 lemburg lemburg  15M May 19 15:26 pyrun3.8
-rwxr-xr-x 1 lemburg lemburg  32M Aug 26  2020 pyrun3.8-debug
-rwxr-xr-x 1 lemburg lemburg 5.0M May 19 15:26 pyrun3.8-upx

There's no bloat, since you don't need the .py/.pyc files for the stdlib
anymore. In fact, you save quite a bit of disk space compared to a
full Python installation and additionally benefit from the memory
mapping the OS does for sharing access to the marshal'ed byte code
between processes.

That said, some things don't work with such an approach, e.g.
a few packages include additional data files which they expect to
find on disk. Since those are not available anymore, they fail.

For PyRun I have patched some of those packages to include the
data in form of Python modules instead, so that it gets frozen
as well, e.g. the Python grammar files.

Whether this is a good approach for Python in general is a different
question, though. PyRun is created on top of the existing released
Python distribution, so it doesn't optimize for being able to
work with the frozen code. In fact, early versions did not
even have a REPL, since the main point was to run a
single released app.

History
Date	User	Action	Args
2021-08-28 11:17:45	lemburg	set	recipients: + lemburg, gvanrossum, brett.cannon, nascheme, ronaldoussoren, larry, Mark.Shannon, eric.snow, brandtbucher, BTaskaya
2021-08-28 11:17:45	lemburg	link	issue45020 messages
2021-08-28 11:17:44	lemburg	create