Message 153596 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	brett.cannon
Recipients	brett.cannon, pitrou
Date	2012-02-17.20:24:21
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<CAP1=2W5Soph9v775fW+YHRQiD7wETJ8TPCoJMv=1hgt7-Dr0Hg@mail.gmail.com>
In-reply-to	<1329506921.3678.22.camel@localhost.localdomain>

Content
On Fri, Feb 17, 2012 at 14:31, Antoine Pitrou <report@bugs.python.org>wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > > Why pre-calculate everything? In the most common case any single > > module will be imported once, if at all. And once it is imported it > > will get cached in sys.modules, alleviating the need to hit the finder > > again. So from a performance standpoint wouldn't it be better not to > > do all of the pre-calculation and instead do that as needed assuming > > that sys.modules will shield the finder from having to do repetitive > > things like figuring out what loader is needed? > > I figured it would avoid repetitive tests for all 10 suffixes. > That said, I have now tried the alternative: find_module() is around 50% > slower, but building the cache is 10x faster. Perhaps this is a winner. > What is the time increase for find_module() vs. the speed-up of building the cache? I.e. how many imports are needed before doing the full calculation is a benefit? And would it make sense to have a hybrid of caching the contents for fast start-up but then caching full details after a successful find? That would mean no work is ever simply tossed out and forgotten. > It would depend on the situation (short or long sys.path, few or many > imports, etc.). Perhaps you can try both patches on your bootstrap repo? > Yep, that's not hard (and it will only get faster as I replace the bodies of __import__() and _gcd_import() with C code so that sys.modules is C-fast again). Question is what to benchmark against? I should probably get the standard benchmarks up and running and see how those are affected (especially the start-up ones). > > > Plus if the finder gets its cache invalidated frequently it will > > simply be wasting its time. > > Well, in real-world situations I don't think the cache will ever get > invalidated: because imports are mostly done at startup, and because > invalidating the cache means you are installing new libraries or > updating existing ones while a running program is about to import > something. > I agree, but it was just something to consider. > > > Otherwise it's good to know three of us now have independently come up > > with fundamentally the same idea for speeding up imports. =) > > Yup :) > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue14043> > _______________________________________ >

On Fri, Feb 17, 2012 at 14:31, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> > Why pre-calculate everything? In the most common case any single
> > module will be imported once, if at all. And once it is imported it
> > will get cached in sys.modules, alleviating the need to hit the finder
> > again. So from a performance standpoint wouldn't it be better not to
> > do all of the pre-calculation and instead do that as needed assuming
> > that sys.modules will shield the finder from having to do repetitive
> > things like figuring out what loader is needed?
>
> I figured it would avoid repetitive tests for all 10 suffixes.
> That said, I have now tried the alternative: find_module() is around 50%
> slower, but building the cache is 10x faster. Perhaps this is a winner.
>

What is the time increase for find_module() vs. the speed-up of building
the cache? I.e. how many imports are needed before doing the full
calculation is a benefit? And would it make sense to have a hybrid of
caching the contents for fast start-up but then caching full details after
a successful find? That would mean no work is ever simply tossed out and
forgotten.

> It would depend on the situation (short or long sys.path, few or many
> imports, etc.). Perhaps you can try both patches on your bootstrap repo?
>

Yep, that's not hard (and it will only get faster as I replace the bodies
of __import__() and _gcd_import() with C code so that sys.modules is C-fast
again). Question is what to benchmark against? I should probably get the
standard benchmarks up and running and see how those are affected
(especially the start-up ones).

>
> >  Plus if the finder gets its cache invalidated frequently it  will
> > simply be wasting its time.
>
> Well, in real-world situations I don't think the cache will ever get
> invalidated: because imports are mostly done at startup, and because
> invalidating the cache means you are installing new libraries or
> updating existing ones while a running program is about to import
> something.
>

I agree, but it was just something to consider.

>
> > Otherwise it's good to know three of us now have independently come up
> > with fundamentally the same idea for speeding up imports. =)
>
> Yup :)
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue14043>
> _______________________________________
>

History
Date	User	Action	Args
2012-02-17 20:24:22	brett.cannon	set	recipients: + brett.cannon, pitrou
2012-02-17 20:24:21	brett.cannon	link	issue14043 messages
2012-02-17 20:24:21	brett.cannon	create