This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eric.snow
Recipients BTaskaya, Mark.Shannon, brandtbucher, brett.cannon, eric.snow, gvanrossum, indygreg, larry, lemburg, methane, nascheme, ronaldoussoren
Date 2021-08-30.17:21:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CALFfu7ARWNCcBWCrSsxW_wHOBAu_wDw4gq8pYdx0rXibMuhKuw@mail.gmail.com>
In-reply-to <1630127679.66.0.551195032564.issue45020@roundup.psfhosted.org>
Content
On Fri, Aug 27, 2021 at 11:14 PM Larry Hastings <report@bugs.python.org> wrote:
> [snip] On the other hand: if we made a viable tool that could consume some arbitrary
> set of .py files and produce a C file, and said C file could then be compiled into a
> shared library, end users could enjoy this speedup over the subset of the standard
> library their program used, and perhaps even their own source tree(s).

Yeah, that would be interesting to investigate.

On Sat, Aug 28, 2021 at 5:17 AM Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> Eric's approach, as I understand it, is pretty much what PyRun does.
> [further details]

It's reassuring to hear that the approach is known to be viable. :)

> In fact, you save quite a bit of disk space compared to a full Python installation and
> additionally benefit from the memory mapping the OS does for sharing access to the
> marshal'ed byte code between processes.

That's a good point.

> That said, some things don't work with such an approach, e.g. a few packages
> include additional data files which they expect to find on disk. Since those are
> not available anymore, they fail.
>
> For PyRun I have patched some of those packages to include the data in form of
> Python modules instead, so that it gets frozen as well, e.g. the Python grammar files.

For stdlib modules it wouldn't be a big problem to set __file__ on
frozen modules.
Would that be enough to solve the problem?

On Sat, Aug 28, 2021 at 5:41 PM Gregory Szorc <report@bugs.python.org> wrote:
> When I investigated freezing the standard library for PyOxidizer, I ran into a rash
> of problems. The frozen importer doesn't behave like PathFinder. It doesn't
> (didn't?) set some common module-level attributes

This is mostly fixable for stdlib modules.  Which attributes would
need to be added?  Are there other missing behaviors?

> Also, when I last looked at the CPython source, the frozen importer performed
> a linear scan of its indexed C array performing strcmp() on each entry until it
> found what it was looking for. So adding hundreds of modules could result in
> sufficient overhead and justify using a more efficient lookup algorithm.
> (PyOxidizer uses Rust's HashMap to index modules by name.)

Yeah, we noticed this too.  I wasn't sure it was something to worry
about at first because we're not freezing the entire stdlib.  We're
freezing on the order of 10, plus all the (80+) encoding modules.  I
figured we could look at an alternative to that linear search
afterward if it made sense.

> * Make sure you run unit tests against the frozen modules. If you don't do this, subtle differences in how the different importers behave will lead to problems.

We'll do what we already do with importlib: run the tests against both
the frozen and the source modules.  Thanks for the reminder to do this
though!

On Sat, Aug 28, 2021 at 5:53 PM Gregory Szorc <report@bugs.python.org> wrote:
> Oh, PyOxidizer also ran into more general issues with the frozen importer in that
> it broke various importlib APIs. e.g. because the frozen importer only supports
> bytecode, you can't use .__loader__.get_source() to obtain the source of a module.
> This makes tracebacks more opaque and breaks legitimate API consumers relying
> on these importlib interfaces.

Good point.  Supporting more of the FileLoader API on the frozen
loader is something to look into, at least for stdlib modules.

> The fundamental limitations with the frozen importer are why I implemented my
> own meta path importer (implemented in pure Rust), which is more fully featured,
> like the PathFinder importer that most people rely on today. That importer is
> available on PyPI (https://pypi.org/project/oxidized-importer/) and has its own API
> to facilitate PyOxidizer-like functionality
> (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer.html) if anyone
> wants to experiment with it.

Awesome!  I'll take a look.

On Sat, Aug 28, 2021 at 6:14 PM Guido van Rossum <report@bugs.python.org> wrote:
> I agree that we should shore up the frozen importer -- probably in a separate PR though.
> (@Eric: do you think this is worth its own bpo issue?)

Yeah.

-eric
History
Date User Action Args
2021-08-30 17:21:58eric.snowsetrecipients: + eric.snow, lemburg, gvanrossum, brett.cannon, nascheme, ronaldoussoren, larry, methane, Mark.Shannon, indygreg, brandtbucher, BTaskaya
2021-08-30 17:21:58eric.snowlinkissue45020 messages
2021-08-30 17:21:57eric.snowcreate