Message 400459 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	BTaskaya, Mark.Shannon, brandtbucher, brett.cannon, eric.snow, gvanrossum, larry, lemburg, nascheme, ronaldoussoren
Date	2021-08-28.04:06:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1630123603.14.0.639670821622.issue45020@roundup.psfhosted.org>
In-reply-to

Content
> Since nobody's said so in so many words (so far in this thread anyway): the prototype from Jeethu Rao in 2018 was a different technology than what Eric is doing. The "Programs/_freeze_importlib.c" Eric's playing with essentially inlines a .pyc file as C static data. The Jeethu Rao approach is more advanced: instead of serializing the objects, it stores the objects from the .pyc file as pre-initialized C static objects. So it saves the un-marshalling step, and therefore should be faster. To import the module you still need to execute the module body code object though--that seems unavoidable. Yes, I know. We're discussing two separate ideas -- Eric's approach, which is doing the same we're doing for importlib for more stdlib modules; and "my" approach, dubbed "deep-freeze", which is similar to Jeethu's approach (details in https://github.com/faster-cpython/ideas/issues/84). What the two approaches have in common is that they require rebuilding the python binary whenever you edit any of the changed modules. I heard somewhere (I'm sorry, I honestly don't recall who said it first, possibly Eric himself) that Jeethu's approach was rejected because of that. FWIW in my attempts to time this, it looks like the perf benefits of Eric's approach are close to those of deep-freezing. And deep-freezing causes much more bloat of the source code and of the resulting binary. (At runtime the binary size is made up by matching heap savings, but to some people binary size is important too.) > The python-dev thread covers nearly everything I remember about this. The one thing I guess I never mentioned is that building and working with the prototype was frightful; it had both Python code and C code, and it was fragile and hard to get working. My hunch at the time was that it shouldn't be so fragile; it should be possible to write the converter in Python: read in .pyc file, generate .c file. It might have to make assumptions about the internal structure of the CPython objects it instantiates as C static data, but since we'd ship the tool with CPython this should be only a minor maintenance issue. Deep-freezing doesn't seem frightful to work with, to me at least. :-) Maybe the foundational work by Eric (e.g. generating sections of Makefile.pre.in) has helped. I don't understand entirely why Jeethu's prototype had part written in C. I never ran it so I don't know what the generated code looked like, but I have a feeling that for objects that don't reference other objects, it would generate a byte array containing the exact contents of the object structure (which it would get from constructing the object in memory and copying the bytes) which was then put together with the object header (containing the refcount and type) and cast to (PyObject ). In contrast, for deep-freeze I just reverse engineered what the structures look like and wrote a Python script to generate C code for an initialized instance of those structures. You can look at some examples here: https://github.com/gvanrossum/cpython/blob/codegen/Python/codegen__collections_abc.c . It's verbose but the C compiler handles it just fine (C compilers have evolved to handle very* large generated programs). > In experimenting with the prototype, I observed that simply calling stat() to ensure the frozen .py file hadn't changed on disk lost us about half the performance win from this approach. I'm not much of a systems programmer, but I wonder if there are (system-proprietary?) library calls one could make to get the stat info for all files in a single directory all at once that might be faster overall. (Of course, caching this information at startup might make for a crappy experience for people who edit Lib/.py files while the interpreter is running.) I think the only solution here was hinted at in the python-dev thread from 2018: have a command-line flag to turn it on or off (e.g. -X deepfreeze=1/0) and have a policy for what the default for that flag should be (e.g. on by default in production builds, off by default in developer builds -- anything that doesn't use --enable-optimizations). > One more observation about the prototype: it doesn't know how to deal with any mutable types. marshal.c can deal with list, dict, and set. Does this matter? ISTM the tree of objects under a code object will never have a reference to one of these mutable objects, so it's probably already fine. Correct, marshal supports things that you will never see in a code object. Perhaps the reason is that when marshal was invented, it wasn't so clear that code objects should be immutable -- that realization came later, when Greg Stein proposed making them ROM-able. That didn't work out, but the notion that code objects should be strictly mutable (to the python user, at least) was born and is now ingrained. > Not sure what else I can tell you. It gave us a measurable improvement in startup time, but it seemed fragile, and it was annoying to work with/on, so after hacking on it for a week (at the 2018 core dev sprint in Redmond WA) I put it aside and moved on to other projects. I'm not so quick to give up. I do believe I have seen similar startup time improvements. But Eric's version (i.e. this issue) is nearly as good, and the binary bloat is much less -- marshal is way more compact than in-memory objects. (Second message) > There should be a boolean flag that enables/disables cached copies of .py files from Lib/. You should be able to turn it off with either an environment variable or a command-line option, and when it's off it skips all the internal cached stuff and uses the normal .py / .pyc machinery. Yeah. > With that in place, it'd be great to pre-cache all the .py files automatically read in at startup. All* the .py files? I think the binary bloat cause by deep-freezing the entire stdlib would be excessive. In fact, Eric's approach freezes everything in the encodings package, which turns out to be a lot of files and a lot of code (lots of simple data tables expressed in code), and I found that for basic startup time, it's best not to deep-freeze the encodings module except for __init__.py, aliases.py and utf_8.py. > As for changes to the build process: the most analogous thing we have is probably Argument Clinic. For what it's worth, Clinic hasn't been very well integrated into the CPython build process. There's a pseudotarget that runs it for you in the Makefile, but it's only ever run manually, and I'm not sure there's any build automation for Windows developers. AFAIK it hasn't really been a problem. But then I'm not sure this is a very good analogy--the workflow for making Clinic changes is very different from people hacking on Lib/*.py. I think we've got reasonably good automation for both Eric's approach and the deep-freeze approach -- all you need to do is run "make" when you've edited one of the (deep-)frozen modules. > It might be sensible to add a mechanism that checks whether or not the pre-cached modules are current. Store a hash for each cached module and check that they all match. This could then be part of the release process, run from a GitHub hook, etc. I think the automation that Eric developed is already good enough. (He even generates Windows project files.) See https://github.com/python/cpython/pull/27980 .

> Since nobody's said so in so many words (so far in this thread anyway): the prototype from Jeethu Rao in 2018 was a different technology than what Eric is doing.  The "Programs/_freeze_importlib.c" Eric's playing with essentially inlines a .pyc file as C static data.  The Jeethu Rao approach is more advanced: instead of serializing the objects, it stores the objects from the .pyc file as pre-initialized C static objects.  So it saves the un-marshalling step, and therefore should be faster.  To import the module you still need to execute the module body code object though--that seems unavoidable.

Yes, I know. We're discussing two separate ideas -- Eric's approach, which is doing the same we're doing for importlib for more stdlib modules; and "my" approach, dubbed "deep-freeze", which is similar to Jeethu's approach (details in https://github.com/faster-cpython/ideas/issues/84).

What the two approaches have in common is that they require rebuilding the python binary whenever you edit any of the changed modules. I heard somewhere (I'm sorry, I honestly don't recall who said it first, possibly Eric himself) that Jeethu's approach was rejected because of that.

FWIW in my attempts to time this, it looks like the perf benefits of Eric's approach are close to those of deep-freezing. And deep-freezing causes much more bloat of the source code and of the resulting binary. (At runtime the binary size is made up by matching heap savings, but to some people binary size is important too.)

> The python-dev thread covers nearly everything I remember about this.  The one thing I guess I never mentioned is that building and working with the prototype was frightful; it had both Python code and C code, and it was fragile and hard to get working.  My hunch at the time was that it shouldn't be so fragile; it should be possible to write the converter in Python: read in .pyc file, generate .c file.  It might have to make assumptions about the internal structure of the CPython objects it instantiates as C static data, but since we'd ship the tool with CPython this should be only a minor maintenance issue.

Deep-freezing doesn't seem frightful to work with, to me at least. :-) Maybe the foundational work by Eric (e.g. generating sections of Makefile.pre.in) has helped.

I don't understand entirely why Jeethu's prototype had part written in C. I never ran it so I don't know what the generated code looked like, but I have a feeling that for objects that don't reference other objects, it would generate a byte array containing the exact contents of the object structure (which it would get from constructing the object in memory and copying the bytes) which was then put together with the object header (containing the refcount and type) and cast to (PyObject *).

In contrast, for deep-freeze I just reverse engineered what the structures look like and wrote a Python script to generate C code for an initialized instance of those structures. You can look at some examples here: https://github.com/gvanrossum/cpython/blob/codegen/Python/codegen__collections_abc.c . It's verbose but the C compiler handles it just fine (C compilers have evolved to handle *very* large generated programs).

> In experimenting with the prototype, I observed that simply calling stat() to ensure the frozen .py file hadn't changed on disk lost us about half the performance win from this approach.  I'm not much of a systems programmer, but I wonder if there are (system-proprietary?) library calls one could make to get the stat info for all files in a single directory all at once that might be faster overall.  (Of course, caching this information at startup might make for a crappy experience for people who edit Lib/*.py files while the interpreter is running.)

I think the only solution here was hinted at in the python-dev thread from 2018: have a command-line flag to turn it on or off (e.g. -X deepfreeze=1/0) and have a policy for what the default for that flag should be (e.g. on by default in production builds, off by default in developer builds -- anything that doesn't use --enable-optimizations).

> One more observation about the prototype: it doesn't know how to deal with any mutable types.  marshal.c can deal with list, dict, and set.  Does this matter?  ISTM the tree of objects under a code object will never have a reference to one of these mutable objects, so it's probably already fine.

Correct, marshal supports things that you will never see in a code object. Perhaps the reason is that when marshal was invented, it wasn't so clear that code objects should be immutable -- that realization came later, when Greg Stein proposed making them ROM-able. That didn't work out, but the notion that code objects should be strictly mutable (to the python user, at least) was born and is now ingrained.

> Not sure what else I can tell you.  It gave us a measurable improvement in startup time, but it seemed fragile, and it was annoying to work with/on, so after hacking on it for a week (at the 2018 core dev sprint in Redmond WA) I put it aside and moved on to other projects.

I'm not so quick to give up. I do believe I have seen similar startup time improvements. But Eric's version (i.e. this issue) is nearly as good, and the binary bloat is much less -- marshal is way more compact than in-memory objects.

(Second message)

> There should be a boolean flag that enables/disables cached copies of .py files from Lib/.  You should be able to turn it off with either an environment variable or a command-line option, and when it's off it skips all the internal cached stuff and uses the normal .py / .pyc machinery.

Yeah.

> With that in place, it'd be great to pre-cache all the .py files automatically read in at startup.

*All* the .py files? I think the binary bloat cause by deep-freezing the entire stdlib would be excessive. In fact, Eric's approach freezes everything in the encodings package, which turns out to be a lot of files and a lot of code (lots of simple data tables expressed in code), and I found that for basic startup time, it's best not to deep-freeze the encodings module except for __init__.py, aliases.py and utf_8.py.

> As for changes to the build process: the most analogous thing we have is probably Argument Clinic.  For what it's worth, Clinic hasn't been very well integrated into the CPython build process.  There's a pseudotarget that runs it for you in the Makefile, but it's only ever run manually, and I'm not sure there's *any* build automation for Windows developers.  AFAIK it hasn't really been a problem.  But then I'm not sure this is a very good analogy--the workflow for making Clinic changes is very different from people hacking on Lib/*.py.

I think we've got reasonably good automation for both Eric's approach and the deep-freeze approach -- all you need to do is run "make" when you've edited one of the (deep-)frozen modules.

> It might be sensible to add a mechanism that checks whether or not the pre-cached modules are current.  Store a hash for each cached module and check that they all match.  This could then be part of the release process, run from a GitHub hook, etc.

I think the automation that Eric developed is already good enough. (He even generates Windows project files.) See https://github.com/python/cpython/pull/27980 .

History
Date	User	Action	Args
2021-08-28 04:06:43	gvanrossum	set	recipients: + gvanrossum, lemburg, brett.cannon, nascheme, ronaldoussoren, larry, Mark.Shannon, eric.snow, brandtbucher, BTaskaya
2021-08-28 04:06:43	gvanrossum	set	messageid: <1630123603.14.0.639670821622.issue45020@roundup.psfhosted.org>
2021-08-28 04:06:43	gvanrossum	link	issue45020 messages
2021-08-28 04:06:42	gvanrossum	create