classification
Title: Import machinery documentation
Type: Stage: patch review
Components: Documentation Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, brett.cannon, eric.araujo, eric.snow, georg.brandl, larry, ncoghlan, pitrou, pje, python-dev, skrah
Priority: release blocker Keywords: patch

Created on 2012-07-08 13:44 by brett.cannon, last changed 2012-08-02 22:16 by ncoghlan. This issue is now closed.

Files
File name Uploaded Description Edit
__import__.pdf brett.cannon, 2012-07-30 21:41
issue15295_glossary_refactor.diff eric.snow, 2012-07-31 05:13
Repositories containing patches
http://hg.python.org/features/pep-420#importdocs
Messages (40)
msg165017 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-08 13:44
I believe Barry said he was going to handle the documentation for PEP 420.
msg165018 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-08 13:48
One request I would like to make is that while the docs are being written, to please look at importlib.find_loader() and let me know if the name no longer applies (it's new in Python 3.3 so it can easily be renamed).
msg166028 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-07-21 14:23
Ping. Barry? (It's not strictly necessary to have the docs for b2, but could you give me a rough estimate when you'll do this?)
msg166056 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-21 19:53
On Jul 21, 2012, at 02:23 PM, Georg Brandl wrote:

>Ping. Barry? (It's not strictly necessary to have the docs for b2, but could
>you give me a rough estimate when you'll do this?)

Unfortunately, I lost a bunch of work with a disk crash, but I might have
salvaged enough in the `importdocs` branch of the features/pep-420 clone.
I'll see what I can come up with.
msg166208 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-23 06:07
From the import-sig discussions, this wasn't just about documenting PEP 420, it was about finally bringing the full import system specification into the language reference. (Now that it doesn't need to be loaded with caveats about the old default import mechanisms)
msg166617 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-27 23:00
First draft is complete, along with updates to the importlib abcs for the new protocols.  You'll see the language reference has a new importmachinery.rst file which describes finding and loading modules.  You'll see that the import statement docs have been simplified to point to this for step (1), and now only describe the name binding operations, i.e. step (2).  Various other documentation updates are made, including new glossary terms.

Everything lives in features/pep-420 in the importdocs branch.  I don't know if it's possible to just attach that branch to this tracker issue.  I'd rather not post a patch right now since that's much less convenient for the inevitable deluge of comments I'm sure I'll get.

Off to email python-dev now.
msg166630 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-28 06:54
Awesome addition, Barry!  Bless you for slogging through this.  Here are some thoughts (prepare one grain of salt for each):

* (glossary.rst) finder: should also reference PEP 420.
* (glossary.rst) module: s/contain/containing/
* (glossary.rst) path importer: I like that you pointed out this specific metapath importer, but aren't path importers something else? [2]  Perhaps the metapath importer doesn't need to be in the glossary.  Then again I like the entry, though I'd change ":term:`finder` / :term:`loader`" to "metapath importer".  Maybe just a different term would work, like "sys path subsystem".  Regardless, it is certainly the big dog in the import machinery and deserves special attention. 
* (glossary.rst) sys path finder: having "sys" is a nice touch, making it more distinct and more explicit.

* (importlib.rst) I could have sworn that find_loader() and resolve_name() were public...
* (importlib.rst) module_repr() is nice.
* (importlib.rst) SourceFileLoader.load_module(): What about when the name does not match?

* (import_machinery.rst) import machinery: really nice intrro!
* (import_machinery.rst) import machinery, end of first paragraph: "Note that importlib.import_module() is the recommended method of calling the import machinery."
* (import_machinery.rst) import machinery, third paragraph: though there is the side effects of the module getting added to sys.modules, and of parent modules getting imported (if not bound).
* (import_machinery.rst) package, second paragraph: "generally" implies further explanation which doesn't materialize.  Perhaps s/modules generally do not contain other modules or packages/modules do not naturally contain other modules or packages/ or something like that?
* (import_machinery.rst) I like that you make it clear that even packages are not strictly a FS-based construct.
* (import_machinery.rst) how about a section devoted just to the attributes of modules and packages, perhaps expanding upon or supplanting the related entries in the data model reference page?
* (import_machinery.rst) Namespace packages: while "provided by a separate vendor installed container" does convey the broad possibilities, it's nearly equivalent to "separate sys.path entries" in practice (and in the example).  Regardless, "separate vendor installed container" could be clarified.
* (import_machinery.rst) Searching, paragraph 1: don't forget importlib.import_module()!  :)
* (import_machinery.rst) The module cache: A gotcha snuck in under the old machinery that may or may not be worth noting. [3]
* (import_machinery.rst) nice point about messing around with sys.modules.
* (import_machinery.rst) I like the sound of "import protocol".
* (import_machinery.rst) Meta path loaders, end of paragraph 2: "The finder could also be a classmethod that returns an instance of the class."
* (import_machinery.rst) Meta path loaders: reload() is no longer a builtin function.
* (import_machinery.rst) Meta path loaders: "If the load fails, the loader needs to remove any modules..." is a pretty exceptional case, since the modules is not in charge of its parent or children, nor of import statements executed for it.  Is this a new requirement?
* (import_machinery.rst) Meta path loaders: too bad there isn't something like "__origin__" for the case where __file__ doesn't make semantic sense, but you still want to record where the module came from.
* (import_machinery.rst) I'm surprised __name__ isn't required.
* (import_machinery.rst) __loader__ is finally getting the respect it deserves (after nearly 10 long years)!
* (import_machinery.rst) Meta path loaders: what should __package__ be set to for a top-level module?
* (import_machinery.rst) Meta path loaders: s/it should execute the module's code/the loader should execute the module's code/.
* (import_machinery.rst) Module reprs: perhaps s/``loader.module_repr(module)``/``module.__loader__.module_repr(module)``/
* (import_machinery.rst) Module reprs: how does module.__qualname__ fit in?
* (import_machinery.rst) module.__path__: s/are consulted/is consulted/ ?
* (import_machinery.rst) The Path Importer: as noted above, this seems like a new usage of "path importer", a term which carries other meaning already.  It's an important and distinct thing though, worthy of its own name. 
* (import_machinery.rst) sys path finders, third paragraph: maybe put a reference to the site module?
* (import_machinery.rst) sys path finders, last paragraph: s/it is used to load/that's what the import machinery uses to load/.
* (import_machinery.rst) NullImporter (issue15473)?  I though Brett had a plan for taking it to the gallows...
* (import_machinery.rst) Diagrams?  Brett again.  :)  He put together some nice ones a few years back.
* (import_machinery.rst) 
* (import_machinery.rst) 

* (simple_stmts.rst) a wonderful improvement!
* (simple_stmts.rst) the from list can include submodules which must be imported separately, implying a step 1b
* (simple_stmts.rst) is __all__ "considered public" in any technical way or is it just convention?  The description somewhat implies that "from module import *" is asking for the module's public API.  That's fine with me.  Explicitly describing it as such would make that connection even more concrete.

Whew.  All in all, Barry, nice work on a difficult and tedious project!  This is such an improvement and long overdue.

Other notes:

[1] A package doesn't necessarily have to correspond to a directory, does it?  Meta path importers should be able to generate packages just as well as path importers.  issue1644818 hints at this.

[2] Doesn't "path importer" refer to the callables on sys.path_hooks that decide the path-based finder to use for a module during filesystem-based imports.  In light of "sys path finder", maybe "sys path importer" is more appropriate.  So "sys path finder" refers to the specialized finder used during FS-based imports.

[3] A module may replace itself in sys.modules.  This came up during the importlib integration when several people pointed out that they relied on this previously unspecified side-effect of the old import machinery (and importlib didn't cooperate).  Django was involved, if I recall correctly.  You've alluded to the situation in the footnote on import_machinery.rst.

I'm pretty sure this still isn't specified, nor that it should be.  And yet...  Anyway, this would somewhat imply that all module attributes set by the loader should likewise be set before the module's code is executed (which _is_ already at least vaguely specified).
msg166716 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 05:10
I would title the new section "Import system" rather than "Import machinery" as it is meant to be a specification documentation rather than an implementation description.

Import statement:

The statement that "from X import A" only performs a single import lookup is incorrect. The trick is that if A, B or C refers to a submodule of X then it will be imported.

I'll use a couple of examples from the logging package to make this clear:

# Attribute access will fail for submodules that haven't been imported yet
>>> import logging
>>> logging.DEBUG
10
>>> logging.handlers
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'handlers'

# Direct imports will fail for attributes that are not submodules
>>> import logging.DEBUG
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'logging.DEBUG'
>>> import logging.handlers

# From imports check for an existing attribute first, but check for a submodule if the attribute is missing
>>> del sys.modules["logging"]
>>> del sys.modules["logging.handlers"]
>>> from logging import DEBUG
>>> from logging import handlers

Aside from this flaw, the new content in the import statement looks good. More on the import system section in a subsequent comment.
msg166718 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 06:09
General comment:

runpy, pkgutil, et al should all get "See Also" links at the top pointing to the new import system section.

Import system intro:

As noted above, I suggest changing the name :)

Opening section should refer to importlib.import_module(). Any mentions of __import__ should point out that its API is designed for the convenience of the interpreter, and thus it's a pain to use directly. However, we should also document explicitly that, unlike the import statement and calling __import__ directly, importlib.import_module will ignore any module level replacements of __import__.

Replacing builtins.__import__ won't reliably override the entire import system (due to module level __import__ definitions, most notably importlib.__import__) and other tools that work with the process global import state directly (e.g. pkgutil, runpy).

5.1 Packages:

Don't tease readers, just tell them: the defining characteristic of a package is that it is a module object with a __path__ attribute.

Since we have the privilege of defining *the* standard terminology for old-style packages, I suggest we use the term "initialised" packages (since having an __init__.py is what makes them special). We should also note explicitly that an initialised package can also behave as a namespace package, by setting __path__ appropriately in __init__.py

Also, I suggest adding a 5.1.3 Package Example subheading - currently you define an initialised package under the namespace package heading

Finally, I think this section needs to explicitly define the terms *import path* and *path entry*. The meta path docs later refer to find_module() accepting a module name and path, and the reader could be forgiven for thinking that meant a filesystem path, when it is actually an import path (which is a sequence of path entries, which may themselves by filesystem paths).

5.2.2 Finders and loaders:

The term "sys path finder" is incorrect as registered path hooks are invoked for both sys.path entries *and* package __path__ entries. I suggest "path entry finder". (I agree a longer name is needed to better distinguish them from metapath finders)

5.2.3 Import hooks:

While it does get cleared up in 5.2.4, this section could be clearer that the hooks *cannot* override the initial check of the module cache.

5.3.4 Metapath:

See above comment about clarifying that an import path is passed to find_module() rather than a filesystem path.

The description of the path importer is incorrect. It only knows how to scan an import path and interrogate the path hooks. It's the individual path entry finders that know how to do things like load modules from the filesystem or zip files.


5.2.5 Meta path loaders

I don't like the title here. There's no such thing as a meta path loader. there are only module loaders. Once they're created, it doesn't matter how you found them.

Clarify that the loader only has to remove the modules it inserted itself. Other modules that were loaded *successfully* as a side effect of the code execution will remain in the cache.

5.3 The Path Importer

As noted above, the path importer is *NOT* restricted to filesystem imports. All it cares about is arbitrary text sequences and path hooks. With the right path hook, you could use URLs or database connection strings as path entries.

5.5 References

I'd also point to PEP 328 (absolute imports by default and explicit relative import syntax) and PEP 338 (using the import system to find __main__)
msg166719 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 06:10
Great start here Barry, I'll switch my checkout over to read/write access and start contributing fixes.
msg166720 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 06:39
Pushed the import machinery -> import system change (which hopefully won't break Barry's world)

Also merged in a more recent version of trunk. This probably screwed up the default branch in this clone, but the clone should be done after these docs updates.
msg166749 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-29 12:19
Updated the statement docs to accurately describe the from X import Y case.

I also noted that unlike the statement form, importlib.import_module ignores module level __import__ overrides.
msg166836 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-29 23:33
On Jul 29, 2012 2:09 AM, "Nick Coghlan" <report@bugs.python.org> wrote:
>
>
> Nick Coghlan added the comment:
>
> General comment:
>
> runpy, pkgutil, et al should all get "See Also" links at the top pointing
to the new import system section.
>
> Import system intro:
>
> As noted above, I suggest changing the name :)
>
> Opening section should refer to importlib.import_module(). Any mentions
of __import__ should point out that its API is designed for the convenience
of the interpreter, and thus it's a pain to use directly. However, we
should also document explicitly that, unlike the import statement and
calling __import__ directly, importlib.import_module will ignore any module
level replacements of __import__.
>
> Replacing builtins.__import__ won't reliably override the entire import
system (due to module level __import__ definitions, most notably
importlib.__import__) and other tools that work with the process global
import state directly (e.g. pkgutil, runpy).
>
> 5.1 Packages:
>
> Don't tease readers, just tell them: the defining characteristic of a
package is that it is a module object with a __path__ attribute.
>
> Since we have the privilege of defining *the* standard terminology for
old-style packages, I suggest we use the term "initialised" packages (since
having an __init__.py is what makes them special). We should also note
explicitly that an initialised package can also behave as a namespace
package, by setting __path__ appropriately in __init__.py
>
> Also, I suggest adding a 5.1.3 Package Example subheading - currently you
define an initialised package under the namespace package heading
>
> Finally, I think this section needs to explicitly define the terms
*import path* and *path entry*. The meta path docs later refer to
find_module() accepting a module name and path, and the reader could be
forgiven for thinking that meant a filesystem path, when it is actually an
import path (which is a sequence of path entries, which may themselves by
filesystem paths).
>
> 5.2.2 Finders and loaders:
>
> The term "sys path finder" is incorrect as registered path hooks are
invoked for both sys.path entries *and* package __path__ entries. I suggest
"path entry finder". (I agree a longer name is needed to better distinguish
them from metapath finders)
>
> 5.2.3 Import hooks:
>
> While it does get cleared up in 5.2.4, this section could be clearer that
the hooks *cannot* override the initial check of the module cache.
>
> 5.3.4 Metapath:
>
> See above comment about clarifying that an import path is passed to
find_module() rather than a filesystem path.
>
> The description of the path importer is incorrect. It only knows how to
scan an import path and interrogate the path hooks. It's the individual
path entry finders that know how to do things like load modules from the
filesystem or zip files.
>
>
> 5.2.5 Meta path loaders
>
> I don't like the title here. There's no such thing as a meta path loader.
there are only module loaders. Once they're created, it doesn't matter how
you found them.
>
> Clarify that the loader only has to remove the modules it inserted
itself. Other modules that were loaded *successfully* as a side effect of
the code execution will remain in the cache.
>
> 5.3 The Path Importer
>
> As noted above, the path importer is *NOT* restricted to filesystem
imports. All it cares about is arbitrary text sequences and path hooks.
With the right path hook, you could use URLs or database connection strings
as path entries.
>

Just so this doesn't get lost and in case it is important enough to block
on: it might be worth having separate ABCs for meta path finders and the
finders pathfinder uses since they have different APIs now (if one ignores
find_module for pathfinder finders). Just need to come up with names.

-brett

> 5.5 References
>
> I'd also point to PEP 328 (absolute imports by default and explicit
relative import syntax) and PEP 338 (using the import system to find
__main__)
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue15295>
> _______________________________________
msg166896 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-30 13:37
Ah, the perils of email readers with quote folding and issue trackers without it. The important part of Brett's email is that PEP 420 has started splitting the meta path finder and path entry finder APIs, but importlib still uses a single ABC for both of them. That's probably a mistake, and something we want to address prior to the release of 3.3. I'll create a separate issue for that.

I just pushed a docs update to the PEP 420 repo that should address all of my comments. I went ahead with the "regular package" -> "initialized package" and "sys path finder" -> "path entry finder" name changes - they just make more sense given the way the components are used.

I wanted to avoid "regular package" as I expect namespace packages to eventually become the norm and initialized packages the more unusual case.

"sys path finder" was simply misleading, as those finders are used for *all* path entries, including those in package __path__ attributes.

I haven't reviewed Eric's comments in detail, so I don't know if I also picked up all of those.
msg166898 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-30 13:40
#15502 records Brett concern about the merged ABC
msg166923 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-30 20:55
Thanks for the review Eric.  I'm slogging through these and many other
comments, but I now have the docs integrated with trunk, and will probably
land them for better or worse in the next day or so.

I'll respond just to a few of your comments.  Whatever I omit you can consider
them "fixed"!

On Jul 28, 2012, at 06:54 AM, Eric Snow wrote:

>* (glossary.rst) path importer: I like that you pointed out this specific
>* metapath importer, but aren't path importers something else? [2] Perhaps
>* the metapath importer doesn't need to be in the glossary.  Then again I
>* like the entry, though I'd change ":term:`finder` / :term:`loader`" to
>* "metapath importer".  Maybe just a different term would work, like "sys
>* path subsystem".  Regardless, it is certainly the big dog in the import
>* machinery and deserves special attention.

I certainly struggled with this term.  I almost picked PathFinder (or "path
finder") since that's the name of the actual class used in the implementation,
but then I thought this might be too implementation specific.

You ask in [2] whether "path importer" refers specifically to the callables on
sys.path_hooks.  Can you site a reference for this?  I found one reference in
PEP 302 to "path importer" but it's hard to tell exactly what that is
referring to.  The sys module doesn't use that term.

If we agree that "path importer" is the name of the things on sys.path_hooks,
then we need a name for the thing on sys.meta_path that implements the things
on sys.path_hook. :)

>* (glossary.rst) sys path finder: having "sys" is a nice touch, making it more distinct and more explicit.

TBH, I'm not crazy about the term "sys path finder" either but I couldn't
think of anything better.

Keep the suggestions coming for both of these terms,.  I'll ruminate on it too
and leave XXX's in the docs for now.  (Maybe as I work through the rest of the
comments, something better has already been suggested.)

>* (importlib.rst) I could have sworn that find_loader() and resolve_name()
>* were public...

There's importlib.find_loader() and importlib.util.resolve_name(), but OTOH,
this is not intended to be the importlib library documentation.  So I'm happy
to leave out such details and add a reference to those docs (done).

In a subsequent comment, Nick suggests this whole chapter be called the
"Import System" instead of the "Import Machinery", but I've been thinking
"Import Protocol" might be good too.  The intent is really to describe the
hooks, methods, and attributes used by Python to accomplish import, as well as
allow Python code to extend or modify the import machinery.  The background at
the top of the chapter is really just there to set the stage (and because
afaict, nothing like that existed before :).

I'm still thinking about this.

>* (importlib.rst) SourceFileLoader.load_module(): What about when the name
>* does not match?

An ImportError gets raised?  Were you suggesting that some additional
documentation should be added for this?

>* (import_machinery.rst) import machinery, end of first paragraph: "Note that
>* importlib.import_module() is the recommended method of calling the import
>* machinery."

I rewrote the introductory paragraphs, and added a mention of
import_module(), as well as a section on importlib.  Hopefully this will
provide enough information for people to figure things out. :)

>* (import_machinery.rst) how about a section devoted just to the attributes
>* of modules and packages, perhaps expanding upon or supplanting the related
>* entries in the data model reference page?

I've added an XXX for this.  I think the right thing to do is to update the
data model chapter, and add a link from here to there.

>* (import_machinery.rst) Meta path loaders, end of paragraph 2: "The finder
>* could also be a classmethod that returns an instance of the class."

I don't understand what you're suggesting here.

>* (import_machinery.rst) Meta path loaders: "If the load fails, the loader
>* needs to remove any modules..." is a pretty exceptional case, since the
>* modules is not in charge of its parent or children, nor of import
>* statements executed for it.  Is this a new requirement?

I don't think so.  I lifted it from somewhere (hard to remember exactly where
now ;).  PEP 302?

>* (import_machinery.rst) Meta path loaders: too bad there isn't something
>* like "__origin__" for the case where __file__ doesn't make semantic sense,
>* but you still want to record where the module came from.

Yeah.

>* (import_machinery.rst) I'm surprised __name__ isn't required.

Indeed!  AFAICT, it was only required by the module object's default repr, but
I fixed that when I added module_repr(). :)

>* (import_machinery.rst) Meta path loaders: what should __package__ be set to
>* for a top-level module?

Great question.  I see no official recommendation in anything I've consulted,
and I think CPython is all over the map.  Some set it to None, some to the
empty string.  I added a footnote about this, and recommend the empty string,
but that's me making an executive decision. ;)

>* (import_machinery.rst) Module reprs: how does module.__qualname__ fit in?

Currently, it doesn't afaict.  The timing of the related PEPs was such that I
don't think PEP 420 ever considered __qualname__.  I'm staying silent on this
for now.

>* (import_machinery.rst) NullImporter (issue15473)?  I though Brett had a
>* plan for taking it to the gallows...

Let's cheer him on!  I have added a footnote about this based on the
discussion in that issue.

>* (import_machinery.rst) Diagrams?  Brett again.  :) He put together some
>* nice ones a few years back.

C'mon Brett, let's see 'em!

>* (simple_stmts.rst) the from list can include submodules which must be
>* imported separately, implying a step 1b

I couldn't figure out what to say differently here.

>* (simple_stmts.rst) is __all__ "considered public" in any technical way or
>* is it just convention?  The description somewhat implies that "from module
>* import *" is asking for the module's public API.  That's fine with me.
>* Explicitly describing it as such would make that connection even more
>* concrete.

I think __all__ should be considered a public, official API.  Certainly it's a
defacto standard.  However, I didn't change much substantial here, so I've
pretty much left it as is.

>[1] A package doesn't necessarily have to correspond to a directory, does it?
>Meta path importers should be able to generate packages just as well as path
>importers.  issue1644818 hints at this.

I've rewritten some of this, so I think the distinction is clearer.  More
clarification can perhaps come after I land this in trunk.

>[3] A module may replace itself in sys.modules.  This came up during the
>importlib integration when several people pointed out that they relied on
>this previously unspecified side-effect of the old import machinery (and
>importlib didn't cooperate).  Django was involved, if I recall correctly.
>You've alluded to the situation in the footnote on import_machinery.rst.

Crazy!  But I've added a footnote about this.
msg166924 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-07-30 21:05
A small note in passing: “protocol” is used for things like the sequence protocol, the iterator protocol, or closer to home the finder and loader protocols, so it would sound weird or potentially confusing to me.  Import system is how I’ve always thought about it (probably took that term from the docs).
msg166926 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-30 21:41
To answer a couple of Barry's comments in reply to Eric...

__package__ should be set to the empty string if I'm reading PEP 362 correctly (and importlib isn't broken): "When the import system encounters an explicit relative import in a module without __package__ set (or with it set to None), it will calculate and store the correct value (__name__.rpartition('.')[0] for normal modules and __name__ for package initialisation modules". If someone sets __package__ to None, then importlib fills it in as necessary.

As for the diagram(s), I have attached the overall PDF that I still have from my original Omnifgraffle file (which I don't have a license to anymore) that I built my PyCon 2008 presentation with. It's probably outdated at this point. I will have to redo them for my PyCon Argentina/Brasil (maybe US?) import talks anyway.
msg166939 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-31 00:28
> I certainly struggled with this term.  I almost picked PathFinder (or "path
> finder") since that's the name of the actual class used in the implementation,
> but then I thought this might be too implementation specific.

Considering that the goal is for importlib to be the common import machinery for the various Python implementations, this might not be inappropriate.

> You ask in [2] whether "path importer" refers specifically to the callables on
> sys.path_hooks.  Can you site a reference for this?  I found one reference in
> PEP 302 to "path importer" but it's hard to tell exactly what that is
> referring to.

Unfortunately not.  There aren't many people that use import hook terminology and I already have a terrible memory. :)  Regardless, I find "path importer" a little too ambiguous.

>>* (glossary.rst) sys path finder: having "sys" is a nice touch, making it more distinct and more explicit.
>
> TBH, I'm not crazy about the term "sys path finder" either but I couldn't
> think of anything better.

What don't you like about the "sys path thingee" names?  I find them to be nice and explicit.  I'll mull this over some more.

> In a subsequent comment, Nick suggests this whole chapter be called the
> "Import System" instead of the "Import Machinery", but I've been thinking
> "Import Protocol" might be good too.

I agree with Nick.

> The background at
> the top of the chapter is really just there to set the stage (and because
> afaict, nothing like that existed before :).

And it does a good job of it.

>>* (importlib.rst) SourceFileLoader.load_module(): What about when the name
>>* does not match?
>
> An ImportError gets raised?  Were you suggesting that some additional
> documentation should be added for this?

I guess I was just noting a possible hole in the Import System (sounds nice, doesn't it <wink>) specification.  Since importlib is a complete reference implementation, it's not critical to have every detail spelled out (at least, that's seems to be the status quo).

>>* (import_machinery.rst) how about a section devoted just to the attributes
>>* of modules and packages, perhaps expanding upon or supplanting the related
>>* entries in the data model reference page?
>
> I've added an XXX for this.  I think the right thing to do is to update the
> data model chapter, and add a link from here to there.

Perfect.

>>* (import_machinery.rst) Meta path loaders, end of paragraph 2: "The finder
>>* could also be a classmethod that returns an instance of the class."
>
> I don't understand what you're suggesting here.

Yeah, that was poorly worded.  I'd meant to suggest that you could document the alternative to find_module() and load_module() being regular methods of the same object.  For instance:

class MyMetaHook:
    @classmethod
    def find_module(cls, name, path=None):
        return cls()
    def load_module(self, name):
        raise ImportError("You lose!")

Thus, the "finder" is the class, and the "loader" is the instance.

>>* (import_machinery.rst) Meta path loaders: "If the load fails, the loader
>>* needs to remove any modules..." is a pretty exceptional case, since the
>>* modules is not in charge of its parent or children, nor of import
>>* statements executed for it.  Is this a new requirement?
>
> I don't think so.  I lifted it from somewhere (hard to remember exactly where
> now ;).  PEP 302?

Nick made the point more clearly.  :)

>>* (simple_stmts.rst) the from list can include submodules which must be
>>* imported separately, implying a step 1b
>
> I couldn't figure out what to say differently here.

No, you've got it covered.
msg166944 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-31 01:53
As far as the path importer goes, it's important to keep in mind there are *four* different pieces in play:

1. The path importer itself

This is a meta path finder installed on sys.meta_path, which implements the find_module API. It scans the supplied search path (or sys.path) for path entries, using sys.path_importer_cache and sys.path_hooks to find the locate path entry finders. "Path importer" is an eminently appropriate name as it is responsible for *all* of the standard semantics of sys.path and package __path__ attribute processing. It could be potentially be qualified with "standard path importer" or "default path importer" to distinguish it from other cases.

2. The path hooks

These are installed in sys.path_hooks, and are simply callables that accept a path entry and return an appropriate path entry handler or else raise ImportError. The specification is designed to make it easy to use the classes for path entry handlers directly as path hooks (since __init__ can throw ImportError, but it can't return None). For these, "path hook" is just fine as a name.

3. The path entry handlers

These are the objects returned by the path hooks. Historically, they implemented find_module() (without the second "search path" parameter), and now they can implement the "find_loader()" API instead.

The reason I don't like "sys path finder" for these is that it misses their essential role in handling package __path__ attributes. I have previously suggested "path entry finder", but that's a little ambiguous (since it suggests they're tools for *finding* path entries, rather than tools for finding module loaders *given* a path entry). Thus, my new suggestion here: "path entry handler". They're objects that handle particular path entries on behalf of the path importer, so the name is perfectly appropriate, and better distinguishes them from the meta path finder objects.

4. The module loaders

As with any import, the module loaders implement the load_module() API to create, cache, initialise and return a loaded module object.
msg166945 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-31 01:54
s/locate path entry finders/appropriate path entry handlers/
msg166952 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-31 03:21
Sounds good to me.  As I understood them:

1. default path importer (a.k.a PathFinder),
2. path hook (lives on sys.path_hooks),
3. path entry handler (finder look-alike that a path hook returns),
4. module loader (business as usual).

A "path entry handler" would stand in contrast to a "meta path finder".  These two would also map well to ABCs for issue15502.
msg166958 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-31 05:13
More on import-related terms.

Given Nick's recommendation, here's a broader view, as related to the import state:

sys.meta_path:
  "meta path finder" -> "module loader"
sys.meta_path[-1] (initially):
  "default path importer"
sys.path_hooks:
  "path hook" -> "path entry handler"
sys.path_importer_cache:
  "path entry handler" -> "module loader"

One unfortunate name is "sys.path_importer_cache", which implies either a cache of "path importers" or a cache belonging to "path importer", both of which are still rather ambiguous.

In light of all the above, I've attached an updated patch just for the glossary.  The import system reference then goes further into the protocols that the different objects implement and so forth.
msg166997 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-31 13:08
While saying "default path importer" vs. "meta path finder" somewhat muddles the term "importer", it definitely gets the point across that PathFinder does a lot more than any other default meta path finder. While _we_ might know that import does nothing more than call a method on sys.meta_path and has no concept of sys.path and friends, most people will consider the default path importer as part of import's semantics and thus not make the distinction.

IOW I like Nick's suggestion.
msg166998 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 14:03
On Jul 29, 2012, at 05:10 AM, Nick Coghlan wrote:

>I would title the new section "Import system" rather than "Import machinery"
>as it is meant to be a specification documentation rather than an
>implementation description.

"Import system" it is.

>The statement that "from X import A" only performs a single import lookup is
>incorrect. The trick is that if A, B or C refers to a submodule of X then it
>will be imported.

I think I see where you and Eric are coming from on this.  Actually, I don't
think I changed the existing text in this regard, but probably once I
refactored out all the details, it reads in such a way as to be confusing.
I've tweaked the text under the import statement to hopefully be more clear.
It could probably still use improvement.
msg167006 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-31 14:56
Part of the problem with the import nomenclature is that PEP 302 doesn't really nail it down and mixes the terms up a bit.  This is understandable considering it broken ground in some regard.  However, at this point we have a more comfortable relationship with the import system.  Would it be feasible to lightly update PEP 302 to have a more concrete and consistent use of import terminology?
msg167008 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 15:22
On Jul 29, 2012, at 06:09 AM, Nick Coghlan wrote:

>runpy, pkgutil, et al should all get "See Also" links at the top pointing to
>the new import system section.

I've put an XXX in the import.rst file for this, but I probably won't get to
adding all the cross references.  Others can take that on once this lands.

>Opening section should refer to importlib.import_module(). Any mentions of
>__import__ should point out that its API is designed for the convenience of
>the interpreter, and thus it's a pain to use directly. However, we should
>also document explicitly that, unlike the import statement and calling
>__import__ directly, importlib.import_module will ignore any module level
>replacements of __import__.
>
>Replacing builtins.__import__ won't reliably override the entire import
>system (due to module level __import__ definitions, most notably
>importlib.__import__) and other tools that work with the process global
>import state directly (e.g. pkgutil, runpy).

While I've added a mention of import_module() in several places, I don't think
the above detail is appropriate for the introduction.  I don't want to
overload folks with all those details before they understand the basics of
Python's import model.

I would much rather add a section that goes into more detail about coarsely
overriding the import system, and there we can discuss replacing built-in
__import__() along with its implications and caveats, including any behavior
changes in Python 3.3 with the adoption of importlib.  I probably won't get to
that so feel free to add such a section later.

>Since we have the privilege of defining *the* standard terminology for
>old-style packages, I suggest we use the term "initialised" packages (since
>having an __init__.py is what makes them special). We should also note
>explicitly that an initialised package can also behave as a namespace
>package, by setting __path__ appropriately in __init__.py

I don't like the term "initialized package" (even with the Americanized
spelling :), because the term "initialized" means "set to the value or put in
the condition appropriate to the start of an operation", which clearly refers
to both types of packages.

What about "concrete package"?  In a sense, namespace packages are virtual, so
the opposite of that would be concrete.  OTOH, while "regular package" may
still not be the right term, it might be good enough.  The bike shed is
already looking pretty tie-died.

>Finally, I think this section needs to explicitly define the terms *import
>path* and *path entry*. The meta path docs later refer to find_module()
>accepting a module name and path, and the reader could be forgiven for
>thinking that meant a filesystem path, when it is actually an import path
>(which is a sequence of path entries, which may themselves by filesystem
>paths).

This is getting somewhere.  I like using the term "path importer" for the
thing that PathFinder is.  ("path finder" doesn't quite do it for me, but
maybe I'm clouded by same term used as a car model. ;)

What we have are several default finders, one that knows how to locate frozen
modules, one that knows how to locate built-in modules, and one that knows how
to search an "import path" (which consists of "path entries").  This latter
finder is the "path importer" and it has further extensibility so that new
types of path entries can be used.  A path entry is a location to search for
modules, and sequences of path entries exist on import paths.  When a search
occurs, typically the import path is taken from sys.path, but for subpackages,
it is taken from the __path__ attribute of its parent package.

This seems to make for much better reading, and while I've worded it
differently to fit better in the flow of the documentation, it's terminology
that feels more right to me.  Thanks!
msg167029 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 19:06
On Jul 30, 2012, at 09:41 PM, Brett Cannon wrote:

>As for the diagram(s), I have attached the overall PDF that I still have from
>my original Omnifgraffle file (which I don't have a license to anymore) that
>I built my PyCon 2008 presentation with. It's probably outdated at this
>point. I will have to redo them for my PyCon Argentina/Brasil (maybe US?)
>import talks anyway.

Thanks.  This isn't quite the level I was looking for, but we can add a
diagram later.

(I think I've improved the discussion on __package__ based on your feedback
and PEP 366.  Thanks!)
msg167030 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 19:14
On Jul 31, 2012, at 12:28 AM, Eric Snow wrote:

>> You ask in [2] whether "path importer" refers specifically to the callables
>> on sys.path_hooks.  Can you site a reference for this?  I found one
>> reference in PEP 302 to "path importer" but it's hard to tell exactly what
>> that is referring to.
>
>Unfortunately not.  There aren't many people that use import hook terminology
>and I already have a terrible memory. :) Regardless, I find "path importer" a
>little too ambiguous.

Dang.  I've grown to really like "path importer" for the thing on
sys.meta_path that provides sys.path and related functionality.  It seems
appropriate given the observation that we're talking about sys.path or
__path__ and what this thing does is manage that corner of the import
subsystem.

Thinking about Nick's suggestion then, the callables on sys.path_hooks would
be "path entry finders" since they are given a chance to find modules for each
entry on the import path (be it sys.path or __path__).

I think this terminology holds together well, and I think I'm going to land it
as such.  Then we can promote this terminology as we talk about the import
system in other documentation.

>>>* (glossary.rst) sys path finder: having "sys" is a nice touch, making it more distinct and more explicit.
>>
>> TBH, I'm not crazy about the term "sys path finder" either but I couldn't
>> think of anything better.
>
>What don't you like about the "sys path thingee" names?  I find them to be
>nice and explicit.  I'll mull this over some more.

Nick put his finger on it.  "sys path" implies that only sys.path is involved,
whereas __path__ is also involved.

>>>* (import_machinery.rst) Meta path loaders, end of paragraph 2: "The finder
>>>* could also be a classmethod that returns an instance of the class."
>>
>> I don't understand what you're suggesting here.
>
>Yeah, that was poorly worded.  I'd meant to suggest that you could document
>the alternative to find_module() and load_module() being regular methods of
>the same object.  For instance:
>
>class MyMetaHook:
>    @classmethod
>    def find_module(cls, name, path=None):
>        return cls()
>    def load_module(self, name):
>        raise ImportError("You lose!")
>
>Thus, the "finder" is the class, and the "loader" is the instance.

While true, it's not required in the specification, so I'd like to leave this
out.  Smart Pythonistas can figure details like that out.
msg167032 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-07-31 19:21
Well, I'm more -0 than -1 on "path importer", though I do like "default path importer" better.  As to the rest,  sounds good to me.
msg167040 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 19:52
I think I was unclear in my previous follow up.  Here are the objects
involved, taken from the glossary.

   import path
      A list of locations (or :term:`path entries <path entry>`) that are
      searched by the :term:`path importer` for modules to import.  During
      import, this list of locations usually comes from :data:`sys.path`, but
      for subpackages it may also come from the parent package's ``__path__``
      attribute.

   meta path finder
      A finder returned by a search of :data:`sys.meta_path`.  Meta path
      finders are related to, but different from :term:`path entry finders
      <path entry finder>`.

   path entry
      A single location on the :term:`import path` which the :term:`path
      importer` consults to find modules for importing.

   path entry finder
      A :term:`finder` returned by a callable on :data:`sys.path_hooks`
      (i.e. a :term:`path entry hook`) which knows how to locate modules given
      a :term:`path entry`.

   path entry hook
      A callable on the :data:`sys.path_hook` list which returns a :term:`path
      entry finder` if it knows how to find modules on a specific :term:`path
      entry`.

   path importer
      One of the default :term:`meta path finders <meta path finder>` which
      searches an :term:`import path` for modules.
msg167041 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-31 19:57
Shouldn't it be committed already? I don't see the point of refining documentation in a separate repo rather than in the main repo.
msg167045 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 20:08
On Jul 31, 2012, at 03:21 AM, Eric Snow wrote:

>1. default path importer (a.k.a PathFinder),

+1, although currently I am refraining from using "default" when describing
this thing.

>2. path hook (lives on sys.path_hooks),

I have called these "path entry hooks"

>3. path entry handler (finder look-alike that a path hook returns),

I still call these "path entry finders".  I understand the ambiguity, and
despite supporting a slightly different protocol than meta path finders, they
still serve the role of finding a loader for a module.  So for now, I'm
keeping "path entry finder", though I'll leave the door slightly open to
persuasion. :)

>4. module loader (business as usual).

I've pulled "Loaders" out into a separate higher level section because as you
say, the loader API is the same for the things returned by both meta path
finders and path entry finders.
msg167046 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-31 20:10
New changeset c933ec7cafcf by Barry Warsaw in branch 'default':
Address substantially all of Eric Snow's comments in issue #15295, except for
http://hg.python.org/cpython/rev/c933ec7cafcf

New changeset d5317b8f455a by Barry Warsaw in branch 'default':
- Issue #15295: Reorganize and rewrite the documentation on the import system.
http://hg.python.org/cpython/rev/d5317b8f455a
msg167048 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-31 20:30
The import path definition is a little misleading as sys.path is only inferred when 'path' has None passed in. Otherwise 'path' is what __path__ in a package is set to, so technically sys.path never even comes into play except by choice from PathFinder as it just chooses to treat None to mean sys.path.
msg167049 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 20:43
On Jul 31, 2012, at 08:30 PM, Brett Cannon wrote:

>The import path definition is a little misleading as sys.path is only
>inferred when 'path' has None passed in. Otherwise 'path' is what __path__ in
>a package is set to, so technically sys.path never even comes into play
>except by choice from PathFinder as it just chooses to treat None to mean
>sys.path.

Do you think the glossary entry needs to be so precise?  It may be difficult
to explain all that in a concise definition.  Maybe it's best to just remove
the "During import...attribute" bit?
msg167051 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-31 20:46
I guess just saying it can be None depending on context would be enough.
msg167052 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-07-31 21:04
On Jul 31, 2012, at 02:56 PM, Eric Snow wrote:

>Part of the problem with the import nomenclature is that PEP 302 doesn't
>really nail it down and mixes the terms up a bit.  This is understandable
>considering it broken ground in some regard.  However, at this point we have
>a more comfortable relationship with the import system.  Would it be feasible
>to lightly update PEP 302 to have a more concrete and consistent use of
>import terminology?

Maybe not an update to PEP 302, but probably a big red warning that the
terminology is out of date, with a reference to the import system
documentation in the reference manual.

This also points out an interesting, more general problem, with PEPs that get
out of date doesn't it?
msg167251 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2012-08-02 20:28
Hope I'm not too late to the bikeshed painting party; just wanted to chip in with the suggestion of "self-contained package" for non-namespace packages.  (i.e., a self-contained package is one that cannot be split across different sys.path entries due to its use of an __init__ module).

Also, technically, namespace portions do not only contribute subpackages; they can contribute modules as well.

Another point of possible confusion: meta finders and path finders are both described as "hooks", but this terminology seems in conflict with the use of "hook" as describing a callable on path_hooks.  Perhaps we could drop the term "hook" from this section, and retitle it "Import System Extensions" and say you can extend the system by writing meta finders and path entry finders.  This would let the term "hook" be the exclusive property of path_hooks, which is how you extend the import system to use your custom finders.

The statement about __path__ being a list is also out-of-date; as of PEP 420, it can be an immutable, iterable object.  Specification-wise, __path__ need only be a re-iterable object, and people reading its value must NOT assume that it's a list or even indexable.

The term "sys path finder" should also be replaced by "path entry finder".  The former term is both incorrect and misleading, as it both implies that such a finder actually searches sys.path, and that it is exclusive to sys.path.  Path entry finders are used to look for modules within a location specified by a "path entry" - a string found in sys.path or in a __path__ attribute.

The term "path importer" is also horribly confusing in context...  after some reflection, I suggest the following additional terminology changes:

1. Replace "meta path finder" with "import handler"
2. Replace "path importer" with "sys.path import handler"

Now we can say that you extend the import system by adding import handlers to sys.meta_path, and that one of the default handlers is the sys.path import handler, which processes imports using sys.path and module __path__ attributes.

The sys.path import handler can of course in turn be extended by adding path hooks to sys.path_hooks, which are used to create module finder objects for the path entry strings found in sys.path and module __path__ attributes.  A path hook must return a finder object, which implements similar methods to those of an import handler, but with some important differences.

Whew.  It's a bit of a mouthful, but I think that this set of terms would keep all the roles and functions clear, along with their relationships to one another.  In addition, I think it provides greater clarity as to which pieces you need to extend when, why, and how.

What do you think?
msg167265 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-02 22:16
We changed quite a bit already as we tried to make everything consistent,
including the importlib ABCs. Current version is on trunk, current
discussion is in #15502
History
Date User Action Args
2012-08-02 22:16:47ncoghlansetmessages: + msg167265
2012-08-02 20:28:26pjesetnosy: + pje
messages: + msg167251
2012-07-31 21:04:23barrysetmessages: + msg167052
2012-07-31 20:46:58brett.cannonsetmessages: + msg167051
2012-07-31 20:43:24barrysetmessages: + msg167049
2012-07-31 20:30:12brett.cannonsetmessages: + msg167048
2012-07-31 20:11:09barrysetstatus: open -> closed
resolution: fixed
2012-07-31 20:10:22python-devsetnosy: + python-dev
messages: + msg167046
2012-07-31 20:08:10barrysetmessages: + msg167045
2012-07-31 19:57:24pitrousetnosy: + pitrou
messages: + msg167041
2012-07-31 19:52:52barrysetmessages: + msg167040
2012-07-31 19:21:21eric.snowsetmessages: + msg167032
2012-07-31 19:14:37barrysetmessages: + msg167030
2012-07-31 19:06:45barrysetmessages: + msg167029
2012-07-31 15:22:59barrysetmessages: + msg167008
2012-07-31 14:56:49eric.snowsetmessages: + msg167006
2012-07-31 14:03:06barrysetmessages: + msg166998
2012-07-31 13:08:04brett.cannonsetmessages: + msg166997
2012-07-31 05:13:58eric.snowsetfiles: + issue15295_glossary_refactor.diff
keywords: + patch
messages: + msg166958
2012-07-31 03:21:09eric.snowsetmessages: + msg166952
2012-07-31 01:54:18ncoghlansetmessages: + msg166945
2012-07-31 01:53:07ncoghlansetmessages: + msg166944
2012-07-31 00:28:03eric.snowsetmessages: + msg166939
2012-07-30 21:41:17brett.cannonsetfiles: + __import__.pdf

messages: + msg166926
2012-07-30 21:05:11eric.araujosetmessages: + msg166924
2012-07-30 20:55:50barrysetmessages: + msg166923
2012-07-30 18:40:58brett.cannonlinkissue15502 dependencies
2012-07-30 13:40:13ncoghlansetmessages: + msg166898
2012-07-30 13:37:08ncoghlansetmessages: + msg166896
2012-07-29 23:33:40brett.cannonsetmessages: + msg166836
2012-07-29 12:19:38ncoghlansetmessages: + msg166749
2012-07-29 06:39:32ncoghlansetmessages: + msg166720
2012-07-29 06:10:28ncoghlansetmessages: + msg166719
2012-07-29 06:09:42ncoghlansetmessages: + msg166718
2012-07-29 05:10:47ncoghlansetmessages: + msg166716
2012-07-29 04:39:15ncoghlansethgrepos: + hgrepo142
2012-07-28 06:54:40eric.snowsetnosy: + eric.snow
messages: + msg166630
2012-07-27 23:11:23barrysetpriority: deferred blocker -> release blocker
title: Document PEP 420 namespace packages -> Import machinery documentation
stage: needs patch -> patch review
2012-07-27 23:00:57barrysetmessages: + msg166617
2012-07-24 10:25:16georg.brandlsetpriority: release blocker -> deferred blocker
2012-07-23 17:31:57skrahsetnosy: + skrah
2012-07-23 06:07:24ncoghlansetmessages: + msg166208
2012-07-23 06:06:08ncoghlansetnosy: + ncoghlan, larry
2012-07-21 19:53:50barrysetmessages: + msg166056
2012-07-21 14:23:38georg.brandlsetnosy: + georg.brandl
messages: + msg166028
2012-07-09 05:01:50eric.araujosetnosy: + eric.araujo
2012-07-08 13:48:28brett.cannonsetmessages: + msg165018
2012-07-08 13:44:43brett.cannoncreate