classification
Title: Make pkgutil.iter_modules() yield built-in modules
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, martin.panter, ncoghlan, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2015-11-02 11:14 by martin.panter, last changed 2016-05-20 14:23 by martin.panter.

Files
File name Uploaded Description Edit
iter-builtin.patch martin.panter, 2015-11-02 11:14 review
iter-builtin-frozen.patch martin.panter, 2015-11-19 03:08 Work on including frozen modules review
iter-builtins-flag.patch martin.panter, 2016-05-20 14:23 builtins flag; no frozen support review
Messages (11)
msg253905 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-02 11:14
When no specific list of paths are given, pkgutil.iter_modules() and walk_packages() will list modules found on sys.path. But they don’t list built-in modules, which don’t live in a particular directory. So many users of these APIs (such as the ModuleScanner class in pydoc) have to separately iterate over sys.builtin_module_names.

I think it would be good to change the pkgutil module to also yield the builtin modules. Attached is a patch which does this.

However I had second thoughts on blindly changing the existing function to yield the extra modules, because this will hurt backward compatibility for people already working around the problem. For example, if I didn’t also update pydoc in my patch, a module search would list the built-in modules twice. Perhaps we could overcome this with an opt-in flag like iter_modules(builtins=True)? I’m interested if anyone else has an opinion on this.

Adding support for builtins could also help with proposals such as listing the entire library (Issue 20506) and autocompletion of module names (Issue 25419).
msg253914 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-02 14:04
IMO, definitely this should not be changed by default.  An alternative to a boolean flag would be a new function (iter_all_modules?).
msg253932 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-02 17:07
How would this handle frozen modules?
msg253959 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-03 02:51
My patch doesn’t handle frozen modules. Maybe we need a new sys.frozen_module_names API, or a sys.list_frozen_module_names() function if it is meant to be dynamic. All I could find was <https://docs.python.org/dev/library/ctypes.html#accessing-values-exported-from-dlls> (which needs updating for Python 3’s byte strings).

A related point: Are built-in packages possible? My patch doesn’t anticipate them either.

If adding a new iter_all_modules() function, there should probably be a matching walk_all_packages() function. But under the hood they would all probably use the same internal functions with a flag.
msg254000 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-03 16:21
The lack of a way to list frozen modules was why I brought it up. :)

And built-in packages are not currently supported, but they theoretically could be if someone put in the work (I think there's an issue for that somewhere).
msg254876 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-19 03:08
I did some work on adding support for frozen modules, but I got stuck. The low level is fine:

>>> pprint(sys.get_frozen_modules())  # (name, ispkg) pairs
(('_frozen_importlib', False),
 ('_frozen_importlib_external', False),
 ('__hello__', False),
 ('__phello__', True),
 ('__phello__.spam', False))
>>> print("\n".join(map(repr, pkgutil.iter_modules(builtins=True))))
(<class '_frozen_importlib.BuiltinImporter'>, '_ast', False)
. . .
(<class '_frozen_importlib.BuiltinImporter'>, 'zipimport', False)
(<class '_frozen_importlib.FrozenImporter'>, '_frozen_importlib', False)
(<class '_frozen_importlib.FrozenImporter'>, '_frozen_importlib_external', False)
(<class '_frozen_importlib.FrozenImporter'>, '__hello__', False)
(<class '_frozen_importlib.FrozenImporter'>, '__phello__', True)
. . .
(FileFinder('.'), 'python-config', False)
. . .

But the current __hello__ and __phello__ modules print stuff when you import them, which messes with walk_packages(), pydoc, etc:

$ ./python -m pydoc -k pkgutil
Hello world!
Hello world!
Hello world!
pkgutil - Utilities to support packages.
test.test_pkgutil 

When I stopped these frozen modules from printing on import (as in my current patch), I found this broke the test suite. In particular, test_importlib.frozen.test_loader relies on the printouts to test what gets executed when importing frozen modules. So I am not sure the best way to continue if I am to add support for frozen modules to iter_modules().

Another problem was that there is no way to list submodules of a frozen package unless you know the package’s name. Currently, iter_modules() only sees a path list, which is empty for __phello__. However, I was able to add a special case to walk_packages(None) to include frozen submodules.

Some questions:

1. Do people think the general idea of enhancing iter_modules() is worthwhile?

2. Should I try harder to solve the problem with running frozen modules, or is it sensible to just leave out the frozen module support?
msg254917 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-19 18:13
I say let it go and make sure the docs clearly document that only modules found off of sys.path are supported. Otherwise I would look at why walk_packages() and pydoc feel the need to import every module and if import simply needs to be tweaked to support this use case better (if at all).
msg265562 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-15 00:34
It looks like Issue 1644818 is the one for built-in _packages_.

I might wind back the frozen module stuff and go back to just builtins and sys.path searching. The two problems were the existing frozen packages that print stuff out for the test suite, and the lack of search “paths” for frozen submodules. I feel that these might be solved without too much effort, so if people want this feature or have suggestions, I am happy to revisit this.

For the record, I think walk_packages() would import every _package_ (not non-packages), in order to be able to properly find their submodules. Also, “pydoc -k” would have to import every module to be able to search its defined classes, functions, doc strings, etc.
msg265565 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-15 01:13
New changeset 2f19766d4b20 by Martin Panter in branch '3.5':
Issue #25533: Update documentation regarding the frozen modules table
https://hg.python.org/cpython/rev/2f19766d4b20

New changeset b20b580bc186 by Martin Panter in branch 'default':
Issue #25533: Merge frozen module docs from 3.5
https://hg.python.org/cpython/rev/b20b580bc186
msg265573 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-05-15 03:44
To answer Brett's question about "Why does walk_packages import parent modules?", the answer is "Because __init__ can modify __path__, so if you don't import the parent module, you may miss things that actually doing the import would find".

The classic example of this is pre-3.3 namespace packages: those work by calling pkgutil.extend_path() or pkg_resources.declare_namespace() from the package's __init__ file, and they can be made to work regardless of whether or not the parent module is implemented as "foo.py" or "foo/__init__.py". 

My recollection is that the pkgutil APIs treat that capability as a basic operating assumption: they import everything they find and check it for a __path__ attribute on the grounds that arbitrary modules *might* set __path__ dynamically.


It would potentially be worthwhile introducing side-effect free variants of these two APIs: "pkgutil.iter_modules_static()" and "pkgutil.walk_packages_static()" (suggested suffix inspired by "inspect.getattr_static()".

The idea with those would be to report all packages that can be found whilst assuming that no module dynamically adds a __path__ attribute to itself, or alters a __path__ attribute calculated by a standard import hook.

Actually doing that in a generic fashion would require either expanding the APIs for meta_path importers and path import hooks or else using functools.simplegeneric to allow new walkers to be registered for unknown importers and hooks, with the latter approach being closer to the way pkgutil currently works.
msg265947 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-20 14:23
iter-builtins-flag.patch has support for built-in modules via the builtins=True flag, but I removed the support for frozen modules.
History
Date User Action Args
2016-05-20 14:23:24martin.pantersetfiles: + iter-builtins-flag.patch

messages: + msg265947
stage: patch review
2016-05-15 03:44:30ncoghlansetnosy: + ncoghlan
messages: + msg265573
2016-05-15 01:13:20python-devsetnosy: + python-dev
messages: + msg265565
2016-05-15 00:34:56martin.pantersetmessages: + msg265562
2015-11-19 18:13:42brett.cannonsetmessages: + msg254917
2015-11-19 03:08:11martin.pantersetfiles: + iter-builtin-frozen.patch

messages: + msg254876
2015-11-03 16:21:23brett.cannonsetmessages: + msg254000
2015-11-03 02:51:13martin.pantersetmessages: + msg253959
2015-11-02 17:07:52brett.cannonsetmessages: + msg253932
2015-11-02 14:05:22r.david.murraysetnosy: + brett.cannon
2015-11-02 14:04:34r.david.murraysetnosy: + r.david.murray
messages: + msg253914
2015-11-02 11:14:13martin.pantercreate