Title: Make importlib.find_spec load packages as needed
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
Status: closed Resolution: fixed
Dependencies: 19700 Superseder:
Assigned To: Nosy List: Arfrever, brett.cannon, eric.snow, ncoghlan, python-dev
Priority: normal Keywords: patch

Created on 2013-12-10 12:29 by ncoghlan, last changed 2014-01-25 22:50 by eric.snow. This issue is now closed.

File name Uploaded Description Edit
issue19944-find-spec-mirror-import-module-direct.diff eric.snow, 2014-01-07 05:21 review
Messages (24)
msg205804 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-10 12:29
In implementing the runpy module updates for PEP 451 I ran into the fact that importlib.find_spec had copied the find_loader behaviour where it doesn't handle dotted names automatically - you have to explicitly pass in the __path__ from the parent module for it to work.

For find_loader, runpy used pkgutil.get_loader instead. The patch on issue 19700 currently includes a runpy._fixed_find_spec function that handles walking the module name components, loading packages as necessary.

I believe it would be better to move the current thin wrapper around importlib._bootstrap._find_spec to importlib.find_spec_on_path (which will only search for the specified module name directly on the given path without breaking it up into components), with find_spec itself changed to *not* take a "path" argument, and instead always doing the full lookup (as implemented in the issue 19700 patch _fixed_find_spec function)
msg205826 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-12-10 15:21
Another option would be to add a keyword-only use_parent_path flag to importlib.find_spec() which is True by default. If use_parent is True, 'path' is provided, and there is no parent, then 'path' can act as a fallback.

Which makes me wonder if also adding a keyword-only import_parents keyword to implicitly import any parent packages as necessary would be helpful. It would imply use_parent_path as true. Might also be useful to add to import_module() to get rid of that annoyance inherited from __import__.
msg205868 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-10 23:07
They're fundamentally different operations though, with one being a single
step of the other. I just think "do the full lookup" should have the more
obvious name, with the "do a single level lookup" still as a public API,
but with the less obvious name.
msg205883 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-11 05:59
As a testament to counter-intuitive API, I did not even realize this about either find_loader() or find_spec() until the bug the other day with reloading submodules.  So I'm on board!

As to the best approach, I'll argue for keeping just 1 function.  I tried out both ways and in the 2 function approach they simply didn't seem different enough.  I've attached a patch (w/o tests, etc.) to show what I'm thinking.

Basically, let's make the "path" parameter keyword-only and rename it to "parent_path" (maybe even "precomputed_parent_path").  The patch takes cues from pkgutil.find_loader().
msg205903 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-11 12:27
One function (the current one) promises not to import anything. The other
(the new one) may have side effects, even if it fails to find the module
(any successfully imported parent modules will remain imported).

IIRC, that "no side effects" guarantee was the original reason for the
find_loader/get_loader split and it's a distinction I think is worth
preserving. I just think the obvious name should be closer to
importlib.import_module in semantics, with the "no side effects" version
being the more obscure one.
msg205906 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-12-11 14:40
I'm not saying get rid of the ability to guarantee no side-effects. All I'm saying is that the point of the function is to find a spec, whether that includes importing or implicitly using a parent package from sys.modules is just technical detail (and thus flags on the function). Put another way, the return value is no different and the basic arguments are the same, it's just internal details of how that return value is found that shifts. For me that's enough to control with keyword-only arguments instead of uniquely named functions.

And I'm willing to argue that import_module() not importing parent packages is annoying and should be controlled with a flag to make life easier since chances are you will want to import the parent packages anyway.
msg205927 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-11 21:31
Look at it from a conceptual point of view, though, there are two quite
different operations:

- find a spec at the current level, using the specified path (which may be
empty, implying a top level import)
- find a spec according to the full import cycle, including any parent
module imports

When you call it, you're going to want one behaviour or the other, since
the two operations are not slight variants, they have major conceptual
differences (with one being a substep of the other).
msg205937 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-11 22:42
It looks like there are two concepts at play though:

1. side-effect-free vs. may-have-side-effects
2. just-find-the-spec-dangit vs. find-the-spec-relative-to-submodule-search-locations-I-already-know

In the case of #1, providing the path is just a means to an end.  In contrast, for #2 providing the path is the whole point and side effects are something to which you may not have given any thought (but you probably aren't expecting them).

In both cases, it still boils down to (module name -> module spec).  To me, handling both with a single function and a keyword-only "path" parameter seems sufficient, as long as people don't miss the fact that there are side effects in the default case.

The tricky part is that this is not a high-use API, so I'm trying not to think in terms of common case.  (I'm tempted to say things like "in the common case you just want the spec and you don't care about that other stuff".)
msg205943 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-11 23:41
Actually, I think you make a good point. importlib.find_spec should use the
*import_module* signature (no path parameter, but with support for relative
imports). If it is exposed at all, the "one level only" single step version
should appear in importlib.machinery, not at the top level.

The current API exposes an implementation detail most users aren't going to
care about, but a "let me know if it exists but don't import it yet" API
that parallels the full dynamic import API is far more intuitive.
msg206024 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-13 07:30
Nick: that sounds good to me.  I like the idea of find_spec() being the same as import_module(), minus actually loading the module.

In that spirit, here's a rough patch that accomplishes that.  It refactors things a bit to get there.  Considering where we are in the release cycle, I'd rather punt on a proper rendition like this this until 3.5.  In the meantime we could duplicate some code for the sake of find_spec() in the 3.4 timeframe.
msg206026 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-13 07:49
Here's a version that I'd feel more comfortable with for 3.4 (with the intent of refactoring in 3.5).
msg206209 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-15 01:49
Adding a dependency on issue 19700, as I'm about to commit that fix and runpy should be updated to use whatever we come up with here.
msg206631 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-19 19:12
I've closed #16492 in favor of this ticket.
msg206986 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-12-27 08:45
Any thoughts on the latest patch?
msg206995 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-27 14:14
The "simple" patch actually looks like a good way to end up with find_spec specific bugs because it diverges from the more thoroughly tested main import path (e.g. it looks to me like it doesn't release the import lock properly)

So the _FoundSpec version actually looks better to me, because it keeps find_spec more inline with actual imports.
msg207211 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-03 06:31
Good points, Nick.  I was trying to limit touches on for 3.4, but that "simple" patch is mostly just a hacky band-aid.  Here's a patch that hopefully stands on its own and still limits touches.  (The patch is the bare-bones changes only.)

There are 2 things in particular with this patch that I'd like to be sure others are comfortable with:

* an import statement (for resolve_name) inside find_spec()
* the use of builtins.__import__() instead of importlib.import_module()

Regarding the "nested" import, I didn't want to import the submodule at the top level due to possible future circular imports (there aren't any now with importlib.util, but still...).  At the same time, to me using something out of a submodule in the parent module in this way is a bit of a bad code smell.  I'd be more inclined to move the resolve_name() implementation into to resolve it, but even better may be to move it to as a private name and then import it into importlib.util.  As it stands, the patch simply uses the "nested" import.

As to using builtins.__import__() when importing the parent module, that seems like it would result in less surprising results in the (uncommon) case that someone is using a custom __import__() that would yield a different result than importlib.import_module().  In the case of find_spec() that makes more sense to me.
msg207212 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-03 07:01
Actually, why *is* find_spec at package level, rather than in util with resolve_name? I know we said it was at package level in the PEP, but we never really gave the question serious thought.

Also, why use builtins.__import__ rather than using __import__ directly? (A comment as to why you're using __import__ over import_module would also be good - I assume it's to get the automatic check for the __path__ attribute)
msg207213 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-03 07:59
find_spec() is at package level because find_module() is and for no other good reason I'm aware of.  I'd be just fine with moving it to util.  I don't expect it to be used enough to warrant that top-level placement.

Regarding builtins.__import__(), I'm using it in the event that someone "installed" their own __import__() which gives a different result than import_module().  Thus the parent used by find_spec() will be the expected one.  I agree that comment about this would be good.
msg207365 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-05 07:59
Brett: What do you think about moving importlib.find_spec() to importlib.util?
msg207402 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-01-05 21:59
Moving to importlib.util is fine by me. I put find_loader() to mirror import_module(), but honestly the former is much less used compared to the latter, so moving it to importlib.util makes sense.
msg207507 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-07 05:21
Here's an update to the patch.  It includes doc and test changes.  It also moves find_spec() into importlib.util.

(I'm removing the other patches since I don't consider them valid approaches any longer).
msg207508 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-07 05:22
(...and to reduce any confusion on which patch is under consideration.)
msg207511 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-01-07 05:35
If nothing else comes up, this will be the last PEP-451 functional change for 3.4.
msg209246 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-25 22:37
New changeset 665f1ba77b57 by Eric Snow in branch 'default':
Issue 19944: Fix importlib.find_spec() so it imports parents as needed.
Date User Action Args
2014-01-25 22:50:44eric.snowsetstatus: pending -> closed
2014-01-25 22:39:44eric.snowsetstatus: open -> pending
type: enhancement
resolution: fixed
stage: resolved
2014-01-25 22:37:48python-devsetnosy: + python-dev
messages: + msg209246
2014-01-07 05:35:04eric.snowsetmessages: + msg207511
2014-01-07 05:22:56eric.snowsetmessages: + msg207508
2014-01-07 05:22:24eric.snowsetfiles: - issue19944-find-spec-parent-module.diff
2014-01-07 05:22:14eric.snowsetfiles: - issue19944-find-spec-mirror-import-module.diff
2014-01-07 05:22:08eric.snowsetfiles: - issue19944-find-spec-mirror-import-module-simple.diff
2014-01-07 05:21:58eric.snowsetfiles: - issue19944-find-spec-mirror-import-module-direct.diff
2014-01-07 05:21:49eric.snowsetfiles: + issue19944-find-spec-mirror-import-module-direct.diff

messages: + msg207507
2014-01-05 21:59:41brett.cannonsetmessages: + msg207402
2014-01-05 07:59:33eric.snowsetmessages: + msg207365
2014-01-03 07:59:56eric.snowsetmessages: + msg207213
2014-01-03 07:01:04ncoghlansetmessages: + msg207212
2014-01-03 06:31:15eric.snowsetfiles: + issue19944-find-spec-mirror-import-module-direct.diff

messages: + msg207211
2013-12-27 14:14:09ncoghlansetmessages: + msg206995
2013-12-27 08:45:46eric.snowsetmessages: + msg206986
2013-12-19 19:12:28eric.snowsetmessages: + msg206631
2013-12-19 19:11:44eric.snowlinkissue16492 superseder
2013-12-15 01:49:28ncoghlansetdependencies: + Update runpy for PEP 451
messages: + msg206209
2013-12-13 07:49:14eric.snowsetfiles: + issue19944-find-spec-mirror-import-module-simple.diff

messages: + msg206026
2013-12-13 07:30:20eric.snowsetfiles: + issue19944-find-spec-mirror-import-module.diff

messages: + msg206024
2013-12-11 23:41:59ncoghlansetmessages: + msg205943
2013-12-11 22:42:05eric.snowsetmessages: + msg205937
2013-12-11 21:31:09ncoghlansetmessages: + msg205927
2013-12-11 14:40:50brett.cannonsetmessages: + msg205906
2013-12-11 12:27:37ncoghlansetmessages: + msg205903
2013-12-11 05:59:24eric.snowsetfiles: + issue19944-find-spec-parent-module.diff

nosy: + eric.snow
messages: + msg205883

keywords: + patch
2013-12-10 23:07:05ncoghlansetmessages: + msg205868
2013-12-10 21:40:27Arfreversetnosy: + Arfrever
2013-12-10 15:21:02brett.cannonsetnosy: + brett.cannon
messages: + msg205826
2013-12-10 12:29:01ncoghlancreate