Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make importlib.find_spec load packages as needed #64143

Closed
ncoghlan opened this issue Dec 10, 2013 · 26 comments
Closed

Make importlib.find_spec load packages as needed #64143

ncoghlan opened this issue Dec 10, 2013 · 26 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 19944
Nosy @brettcannon, @terryjreedy, @ncoghlan, @ericsnowcurrently
Dependencies
  • bpo-19700: Update runpy for PEP 451
  • Files
  • issue19944-find-spec-mirror-import-module-direct.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-01-25.22:50:44.859>
    created_at = <Date 2013-12-10.12:29:01.845>
    labels = ['type-feature', 'library']
    title = 'Make importlib.find_spec load packages as needed'
    updated_at = <Date 2016-04-04.19:43:34.330>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2016-04-04.19:43:34.330>
    actor = 'eric.snow'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-01-25.22:50:44.859>
    closer = 'eric.snow'
    components = ['Library (Lib)']
    creation = <Date 2013-12-10.12:29:01.845>
    creator = 'ncoghlan'
    dependencies = ['19700']
    files = ['33336']
    hgrepos = []
    issue_num = 19944
    keywords = ['patch']
    message_count = 26.0
    messages = ['205804', '205826', '205868', '205883', '205903', '205906', '205927', '205937', '205943', '206024', '206026', '206209', '206631', '206986', '206995', '207211', '207212', '207213', '207365', '207402', '207507', '207508', '207511', '209246', '262862', '262866']
    nosy_count = 6.0
    nosy_names = ['brett.cannon', 'terry.reedy', 'ncoghlan', 'Arfrever', 'python-dev', 'eric.snow']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue19944'
    versions = ['Python 3.4']

    @ncoghlan
    Copy link
    Contributor Author

    In implementing the runpy module updates for PEP-451 I ran into the fact that importlib.find_spec had copied the find_loader behaviour where it doesn't handle dotted names automatically - you have to explicitly pass in the __path__ from the parent module for it to work.

    For find_loader, runpy used pkgutil.get_loader instead. The patch on bpo-19700 currently includes a runpy._fixed_find_spec function that handles walking the module name components, loading packages as necessary.

    I believe it would be better to move the current thin wrapper around importlib._bootstrap._find_spec to importlib.find_spec_on_path (which will only search for the specified module name directly on the given path without breaking it up into components), with find_spec itself changed to *not* take a "path" argument, and instead always doing the full lookup (as implemented in the bpo-19700 patch _fixed_find_spec function)

    @ncoghlan ncoghlan added the stdlib Python modules in the Lib dir label Dec 10, 2013
    @brettcannon
    Copy link
    Member

    Another option would be to add a keyword-only use_parent_path flag to importlib.find_spec() which is True by default. If use_parent is True, 'path' is provided, and there is no parent, then 'path' can act as a fallback.

    Which makes me wonder if also adding a keyword-only import_parents keyword to implicitly import any parent packages as necessary would be helpful. It would imply use_parent_path as true. Might also be useful to add to import_module() to get rid of that annoyance inherited from __import__.

    @ncoghlan
    Copy link
    Contributor Author

    They're fundamentally different operations though, with one being a single
    step of the other. I just think "do the full lookup" should have the more
    obvious name, with the "do a single level lookup" still as a public API,
    but with the less obvious name.

    @ericsnowcurrently
    Copy link
    Member

    As a testament to counter-intuitive API, I did not even realize this about either find_loader() or find_spec() until the bug the other day with reloading submodules. So I'm on board!

    As to the best approach, I'll argue for keeping just 1 function. I tried out both ways and in the 2 function approach they simply didn't seem different enough. I've attached a patch (w/o tests, etc.) to show what I'm thinking.

    Basically, let's make the "path" parameter keyword-only and rename it to "parent_path" (maybe even "precomputed_parent_path"). The patch takes cues from pkgutil.find_loader().

    @ncoghlan
    Copy link
    Contributor Author

    One function (the current one) promises not to import anything. The other
    (the new one) may have side effects, even if it fails to find the module
    (any successfully imported parent modules will remain imported).

    IIRC, that "no side effects" guarantee was the original reason for the
    find_loader/get_loader split and it's a distinction I think is worth
    preserving. I just think the obvious name should be closer to
    importlib.import_module in semantics, with the "no side effects" version
    being the more obscure one.

    @brettcannon
    Copy link
    Member

    I'm not saying get rid of the ability to guarantee no side-effects. All I'm saying is that the point of the function is to find a spec, whether that includes importing or implicitly using a parent package from sys.modules is just technical detail (and thus flags on the function). Put another way, the return value is no different and the basic arguments are the same, it's just internal details of how that return value is found that shifts. For me that's enough to control with keyword-only arguments instead of uniquely named functions.

    And I'm willing to argue that import_module() not importing parent packages is annoying and should be controlled with a flag to make life easier since chances are you will want to import the parent packages anyway.

    @ncoghlan
    Copy link
    Contributor Author

    Look at it from a conceptual point of view, though, there are two quite
    different operations:

    • find a spec at the current level, using the specified path (which may be
      empty, implying a top level import)
    • find a spec according to the full import cycle, including any parent
      module imports

    When you call it, you're going to want one behaviour or the other, since
    the two operations are not slight variants, they have major conceptual
    differences (with one being a substep of the other).

    @ericsnowcurrently
    Copy link
    Member

    It looks like there are two concepts at play though:

    1. side-effect-free vs. may-have-side-effects
    2. just-find-the-spec-dangit vs. find-the-spec-relative-to-submodule-search-locations-I-already-know

    In the case of #1, providing the path is just a means to an end. In contrast, for #2 providing the path is the whole point and side effects are something to which you may not have given any thought (but you probably aren't expecting them).

    In both cases, it still boils down to (module name -> module spec). To me, handling both with a single function and a keyword-only "path" parameter seems sufficient, as long as people don't miss the fact that there are side effects in the default case.

    The tricky part is that this is not a high-use API, so I'm trying not to think in terms of common case. (I'm tempted to say things like "in the common case you just want the spec and you don't care about that other stuff".)

    @ncoghlan
    Copy link
    Contributor Author

    Actually, I think you make a good point. importlib.find_spec should use the
    *import_module* signature (no path parameter, but with support for relative
    imports). If it is exposed at all, the "one level only" single step version
    should appear in importlib.machinery, not at the top level.

    The current API exposes an implementation detail most users aren't going to
    care about, but a "let me know if it exists but don't import it yet" API
    that parallels the full dynamic import API is far more intuitive.

    @ericsnowcurrently
    Copy link
    Member

    Nick: that sounds good to me. I like the idea of find_spec() being the same as import_module(), minus actually loading the module.

    In that spirit, here's a rough patch that accomplishes that. It refactors things a bit to get there. Considering where we are in the release cycle, I'd rather punt on a proper rendition like this this until 3.5. In the meantime we could duplicate some code for the sake of find_spec() in the 3.4 timeframe.

    @ericsnowcurrently
    Copy link
    Member

    Here's a version that I'd feel more comfortable with for 3.4 (with the intent of refactoring in 3.5).

    @ncoghlan
    Copy link
    Contributor Author

    Adding a dependency on bpo-19700, as I'm about to commit that fix and runpy should be updated to use whatever we come up with here.

    @ericsnowcurrently
    Copy link
    Member

    I've closed bpo-16492 in favor of this ticket.

    @ericsnowcurrently
    Copy link
    Member

    Any thoughts on the latest patch?

    @ncoghlan
    Copy link
    Contributor Author

    The "simple" patch actually looks like a good way to end up with find_spec specific bugs because it diverges from the more thoroughly tested main import path (e.g. it looks to me like it doesn't release the import lock properly)

    So the _FoundSpec version actually looks better to me, because it keeps find_spec more inline with actual imports.

    @ericsnowcurrently
    Copy link
    Member

    Good points, Nick. I was trying to limit touches on _bootstrap.py for 3.4, but that "simple" patch is mostly just a hacky band-aid. Here's a patch that hopefully stands on its own and still limits touches. (The patch is the bare-bones changes only.)

    There are 2 things in particular with this patch that I'd like to be sure others are comfortable with:

    • an import statement (for resolve_name) inside find_spec()
    • the use of builtins.__import__() instead of importlib.import_module()

    Regarding the "nested" import, I didn't want to import the submodule at the top level due to possible future circular imports (there aren't any now with importlib.util, but still...). At the same time, to me using something out of a submodule in the parent module in this way is a bit of a bad code smell. I'd be more inclined to move the resolve_name() implementation into _bootstrap.py to resolve it, but even better may be to move it to __init__.py as a private name and then import it into importlib.util. As it stands, the patch simply uses the "nested" import.

    As to using builtins.__import__() when importing the parent module, that seems like it would result in less surprising results in the (uncommon) case that someone is using a custom __import__() that would yield a different result than importlib.import_module(). In the case of find_spec() that makes more sense to me.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Jan 3, 2014

    Actually, why *is* find_spec at package level, rather than in util with resolve_name? I know we said it was at package level in the PEP, but we never really gave the question serious thought.

    Also, why use builtins.__import__ rather than using __import__ directly? (A comment as to why you're using __import__ over import_module would also be good - I assume it's to get the automatic check for the __path__ attribute)

    @ericsnowcurrently
    Copy link
    Member

    find_spec() is at package level because find_module() is and for no other good reason I'm aware of. I'd be just fine with moving it to util. I don't expect it to be used enough to warrant that top-level placement.

    Regarding builtins.__import__(), I'm using it in the event that someone "installed" their own __import__() which gives a different result than import_module(). Thus the parent used by find_spec() will be the expected one. I agree that comment about this would be good.

    @ericsnowcurrently
    Copy link
    Member

    Brett: What do you think about moving importlib.find_spec() to importlib.util?

    @brettcannon
    Copy link
    Member

    Moving to importlib.util is fine by me. I put find_loader() to mirror import_module(), but honestly the former is much less used compared to the latter, so moving it to importlib.util makes sense.

    @ericsnowcurrently
    Copy link
    Member

    Here's an update to the patch. It includes doc and test changes. It also moves find_spec() into importlib.util.

    (I'm removing the other patches since I don't consider them valid approaches any longer).

    @ericsnowcurrently
    Copy link
    Member

    (...and to reduce any confusion on which patch is under consideration.)

    @ericsnowcurrently
    Copy link
    Member

    If nothing else comes up, this will be the last PEP-451 functional change for 3.4.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 25, 2014

    New changeset 665f1ba77b57 by Eric Snow in branch 'default':
    bpo-19944: Fix importlib.find_spec() so it imports parents as needed.
    http://hg.python.org/cpython/rev/665f1ba77b57

    @ericsnowcurrently ericsnowcurrently added the type-feature A feature request or enhancement label Jan 25, 2014
    @terryjreedy
    Copy link
    Member

    pyclbr.py has this comment
    # XXX This will change once bpo-19944 lands.
    above the line that was changed by the patch applied. Am I correct in thinking that the comment is obsolete and should be removed?

    @ericsnowcurrently
    Copy link
    Member

    Yeah, I'm pretty sure that TODO is out of date.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants