classification
Title: Speed-up importlib's _FileFinder
Type: performance Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2012-02-17 16:38 by pitrou, last changed 2012-02-20 00:57 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
find_module_cache.patch pitrou, 2012-02-17 16:54
find_module_cache2.patch pitrou, 2012-02-17 17:17
find_module_cache3.patch pitrou, 2012-02-17 17:45 review
listdir_cache.patch pitrou, 2012-02-17 19:32 review
listdir_cache2.patch pitrou, 2012-02-17 21:48 review
listdir_cache3.patch pitrou, 2012-02-18 20:52 review
Messages (24)
msg153566 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 16:38
This patch makes importlib's _FileFinder more than 10x faster by keeping a cache of directory entries. It's actually faster than imp.find_module()!

-> imp.find_module:

$ ./python -m timeit -s "import imp" "imp.find_module('logging', None)"
10000 loops, best of 3: 69.9 usec per loop
$ ./python -m timeit -s "import imp" "imp.find_module('bisect', None)"
10000 loops, best of 3: 108 usec per loop

-> unpatched importlib:

$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as Finder" "Finder.find_module('logging', None)"
1000 loops, best of 3: 431 usec per loop
$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as Finder" "Finder.find_module('bisect', None)"
1000 loops, best of 3: 468 usec per loop

-> patched importlib:

$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as Finder" "Finder.find_module('logging', None)"
10000 loops, best of 3: 37.5 usec per loop
$ ./python -m timeit -s "from importlib._bootstrap import _DefaultPathFinder as Finder" "Finder.find_module('bisect', None)"
10000 loops, best of 3: 36.9 usec per loop
msg153572 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 17:13
Note that the cost of filling the cache itself isn't trivial:

$ ./python -m timeit -s "import sys; from importlib import _bootstrap" "_bootstrap._file_path_hook('Lib')._fill_cache()"
1000 loops, best of 3: 1.88 msec per loop
msg153575 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 17:17
Updated patch with faster cache filling.
msg153582 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 17:37
I have not had a chance to deep-dive in the patch, but I just wanted to double-check that this (a) implemented the idea PJE and I discussed on python-dev, and (b) you re-generate the patch as I pushed some changes to importlib today.
msg153583 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 17:41
> I have not had a chance to deep-dive in the patch, but I just wanted
> to double-check that this (a) implemented the idea PJE and I discussed
> on python-dev

I haven't followed this discussion.

> , and (b) you re-generate the patch as I pushed some changes to
> importlib today.

Will do.
msg153584 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 17:45
Regenerated patch.
msg153587 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 17:59
Just to fill you in, the discussion centred on the idea of doing a listdir() of the directory the FileFinder was in charge of watching and caching that. Then, when it had to look up a file all it had to do was stat the directory to look for a change before it simply looked at a frozenset of directory contents. That does away with having to stat for every file type (directory, module, extension, etc.) and replaces it with one upfront listdir() and then a single directory stat per lookup.
msg153588 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 18:02
> Just to fill you in, the discussion centred on the idea of doing a
> listdir() of the directory the FileFinder was in charge of watching
> and caching that. Then, when it had to look up a file all it had to do
> was stat the directory to look for a change before it simply looked at
> a frozenset of directory contents. That does away with having to stat
> for every file type (directory, module, extension, etc.) and replaces
> it with one upfront listdir() and then a single directory stat per
> lookup.

Ah, then it's basically what my patch does (except that it also computes
all potential module names instead of dumping the os.listdir() result in
a frozenset).
msg153590 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 18:57
OK, I have gone ahead and done a review over on rietveld for the code as-is. But I do have a design question.

Why pre-calculate everything? In the most common case any single module will be imported once, if at all. And once it is imported it will get cached in sys.modules, alleviating the need to hit the finder again. So from a performance standpoint wouldn't it be better not to do all of the pre-calculation and instead do that as needed assuming that sys.modules will shield the finder from having to do repetitive things like figuring out what loader is needed? You will have to do the calculation regardless for the first import, but you might not need to import every module in a directory. Plus if the finder gets its cache invalidated frequently it  will simply be wasting its time.

I'm not going to argue from the perspective that most modules won't get imported as that's purely a stdlib thing; I am willing to bet most projects import nearly all of their modules in each execution.

Otherwise it's good to know three of us now have independently come up with fundamentally the same idea for speeding up imports. =)
msg153592 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 19:31
> Why pre-calculate everything? In the most common case any single
> module will be imported once, if at all. And once it is imported it
> will get cached in sys.modules, alleviating the need to hit the finder
> again. So from a performance standpoint wouldn't it be better not to
> do all of the pre-calculation and instead do that as needed assuming
> that sys.modules will shield the finder from having to do repetitive
> things like figuring out what loader is needed?

I figured it would avoid repetitive tests for all 10 suffixes.
That said, I have now tried the alternative: find_module() is around 50%
slower, but building the cache is 10x faster. Perhaps this is a winner.
It would depend on the situation (short or long sys.path, few or many
imports, etc.). Perhaps you can try both patches on your bootstrap repo?

>  Plus if the finder gets its cache invalidated frequently it  will
> simply be wasting its time.

Well, in real-world situations I don't think the cache will ever get
invalidated: because imports are mostly done at startup, and because
invalidating the cache means you are installing new libraries or
updating existing ones while a running program is about to import
something.

> Otherwise it's good to know three of us now have independently come up
> with fundamentally the same idea for speeding up imports. =)

Yup :)
msg153593 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 19:32
Patch with light-weight alternative.
msg153596 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 20:24
On Fri, Feb 17, 2012 at 14:31, Antoine Pitrou <report@bugs.python.org>wrote:

>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> > Why pre-calculate everything? In the most common case any single
> > module will be imported once, if at all. And once it is imported it
> > will get cached in sys.modules, alleviating the need to hit the finder
> > again. So from a performance standpoint wouldn't it be better not to
> > do all of the pre-calculation and instead do that as needed assuming
> > that sys.modules will shield the finder from having to do repetitive
> > things like figuring out what loader is needed?
>
> I figured it would avoid repetitive tests for all 10 suffixes.
> That said, I have now tried the alternative: find_module() is around 50%
> slower, but building the cache is 10x faster. Perhaps this is a winner.
>

What is the time increase for find_module() vs. the speed-up of building
the cache? I.e. how many imports are needed before doing the full
calculation is a benefit? And would it make sense to have a hybrid of
caching the contents for fast start-up but then caching full details after
a successful find? That would mean no work is ever simply tossed out and
forgotten.

> It would depend on the situation (short or long sys.path, few or many
> imports, etc.). Perhaps you can try both patches on your bootstrap repo?
>

Yep, that's not hard (and it will only get faster as I replace the bodies
of __import__() and _gcd_import() with C code so that sys.modules is C-fast
again). Question is what to benchmark against? I should probably get the
standard benchmarks up and running and see how those are affected
(especially the start-up ones).

>
> >  Plus if the finder gets its cache invalidated frequently it  will
> > simply be wasting its time.
>
> Well, in real-world situations I don't think the cache will ever get
> invalidated: because imports are mostly done at startup, and because
> invalidating the cache means you are installing new libraries or
> updating existing ones while a running program is about to import
> something.
>

I agree, but it was just something to consider.

>
> > Otherwise it's good to know three of us now have independently come up
> > with fundamentally the same idea for speeding up imports. =)
>
> Yup :)
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue14043>
> _______________________________________
>
msg153597 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 20:28
And while I'm thinking about it, the hybrid caching approach of caching just the directory contents to start and then caching full details on successful searches (and failures) would also let you cache the full file path as the strings will be interned and used again by the loaders for setting __file__.
msg153600 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 21:41
> And while I'm thinking about it, the hybrid caching approach of
> caching just the directory contents to start and then caching full
> details on successful searches (and failures) would also let you cache
> the full file path as the strings will be interned and used again by
> the loaders for setting __file__.

Well, you shouldn't call find_module() on the same module twice (except
for benchmarks), so I'm not sure what caching the results would bring.
msg153601 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 21:48
Updated light-weight caching patch (now checks PYTHONCASEOK dynamically).
msg153605 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 22:01
The lastest patch (listdir_cache2) LGTM. Have you by any chance patched it in and run ./python -m importlib.test.regrtest (runs the entire test suite with builtins.__import__() substituted with importlib.__import__())? If not I can do it on my ride home tonight to make sure semantics didn't surprisingly change (I don't see how). And then I can benchmark against the standard benchmark suite to see if it speeds things up in my bootstrap branch.
msg153607 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 22:36
> The lastest patch (listdir_cache2) LGTM. Have you by any chance
> patched it in and run ./python -m importlib.test.regrtest (runs the
> entire test suite with builtins.__import__() substituted with
> importlib.__import__())?

There are a couple of test failures (in test_import, test_runpy,
test_reprlib). Apparently this is due to false positives when comparing
directory modification times. If I sleep() enough between tests, the
failures disappear...
I wonder if this could be a problem in real-life (say, someone compiling
some DSL to Python code on-the-fly and expecting import to pick up
without any timestamp problems).
msg153608 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-17 22:45
Well, it's the reason you added file size to .pyc files. =) I assume if you store the system clock as well to see if no time has changed it would kill the performance gain. We could also have two versions of _FileFinder such that people could choose which they preferred.
msg153609 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-17 22:50
> Well, it's the reason you added file size to .pyc files. =)

pyc files only store the integer timestamp. Here I store the original
floating-point timestamp, and I assumed it would be sufficient :)

>  I assume if you store the system clock as well to see if no time has
> changed it would kill the performance gain.

I don't know. I might give it a try but it might still be fragile.
msg153655 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-18 19:55
Another possibility would be to include an explicit invalidate_caches() function in importlib.
msg153658 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-18 20:52
This is a patch demonstrating invalidate_caches().
As for test_runpy, it seems runpy uses imp.get_loader(), so I don't see how it can remain synchronized with importlib's state.
msg153660 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-02-18 23:42
I wouldn't worry too much about test_runpy as it has always been a touchy test for me.
msg153747 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-20 00:55
New changeset bbaab666e6c7 by Antoine Pitrou in branch 'default':
Issue #14043: Speed up importlib's _FileFinder by at least 8x, and add a new importlib.invalidate_caches() function.
http://hg.python.org/cpython/rev/bbaab666e6c7
msg153749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-20 00:57
Committed now (with invalidate_caches() as well).
History
Date User Action Args
2012-02-20 00:57:35pitrousetstatus: open -> closed
resolution: fixed
messages: + msg153749

stage: commit review -> resolved
2012-02-20 00:55:39python-devsetnosy: + python-dev
messages: + msg153747
2012-02-18 23:42:20brett.cannonsetmessages: + msg153660
2012-02-18 20:52:52pitrousetfiles: + listdir_cache3.patch

messages: + msg153658
2012-02-18 19:55:43pitrousetmessages: + msg153655
2012-02-18 03:47:28eric.snowsetnosy: + eric.snow
2012-02-17 22:50:35pitrousetmessages: + msg153609
2012-02-17 22:45:59brett.cannonsetmessages: + msg153608
2012-02-17 22:36:31pitrousetmessages: + msg153607
2012-02-17 22:01:41brett.cannonsetmessages: + msg153605
stage: patch review -> commit review
2012-02-17 21:48:51pitrousetfiles: + listdir_cache2.patch
2012-02-17 21:48:43pitrousetmessages: + msg153601
2012-02-17 21:41:19pitrousetmessages: + msg153600
2012-02-17 20:28:57brett.cannonsetmessages: + msg153597
2012-02-17 20:24:21brett.cannonsetmessages: + msg153596
2012-02-17 19:32:17pitrousetfiles: + listdir_cache.patch

messages: + msg153593
2012-02-17 19:31:55pitrousetmessages: + msg153592
2012-02-17 18:57:22brett.cannonsetmessages: + msg153590
2012-02-17 18:02:34pitrousetmessages: + msg153588
2012-02-17 17:59:16brett.cannonsetmessages: + msg153587
2012-02-17 17:45:37pitrousetfiles: + find_module_cache3.patch

messages: + msg153584
2012-02-17 17:41:29pitrousetmessages: + msg153583
2012-02-17 17:37:46brett.cannonsetmessages: + msg153582
2012-02-17 17:17:04pitrousetfiles: + find_module_cache2.patch

messages: + msg153575
2012-02-17 17:13:53pitrousetmessages: + msg153572
2012-02-17 16:54:44pitrousetfiles: + find_module_cache.patch
2012-02-17 16:54:37pitrousetfiles: - find_module_cache.patch
2012-02-17 16:38:26pitroucreate