classification
Title: pkgutil.walk_packages returns extra modules
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Andrey Nehaychik, Arfrever, chris.jerdonek, docs@python, eric.araujo, eric.smith, faassen, gennad, ncoghlan, scorphus
Priority: normal Keywords:

Created on 2012-05-12 09:01 by chris.jerdonek, last changed 2020-01-29 00:16 by brett.cannon.

Messages (11)
msg160464 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-05-12 09:01
pkgutil.walk_packages(paths) seems to return incorrect results when the name of a subpackage of a path in paths matches the name of a package in the standard library.  It both excludes modules it should include, and includes modules it should exclude.  Here is an example:

> mkdir temp
> touch temp/__init__.py
> touch temp/foo.py
> mkdir temp/logging
> touch temp/logging/__init__.py
> touch temp/logging/bar.py
> python
Python 3.2.3 (default, Apr 29 2012, 01:19:06) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from pkgutil import walk_packages
>>> for info in walk_packages(['temp']):
...   print(info[1], info[0].path)
... 
foo temp
logging temp
logging.config /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/logging
logging.handlers /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/logging
>>> 

Observe that logging.bar is absent from the list, and logging.config and  logging.handlers are included.
msg160469 - (view) Author: Gennadiy Zlobin (gennad) * Date: 2012-05-12 13:32
I confirm this behavior in 2.7 and 3.2 versions. In my 3.3.0a3+ it actually outputs nothing.
Also note that if you rename logging to logging2, you actually get

foo temp
logging2 temp
msg165094 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-09 16:25
So the lack of output in 3.3 is not surprising as walk_packages() won't work with the new import implementation as it relies on a non-standard method on loaders that import does not provide.
msg165537 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-15 17:20
For the record, this issue is still present after Nick's pkgutil changes documented in issue 15343 (not that I expected it to be resolved since this issue is a bit different).
msg165605 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-16 13:53
Right, this is a separate bug in pkgutil. Specifically, when it goes to import a package in order to check it for submodules, it invokes the global import system via __import__() rather than constraining the import to the path argument supplied to walk_packages.

This means that it will only find it if the path being walked is already on sys.path. In the case of your example, it isn't (it's on a subdirectory).

The reason my new tests didn't pick this up is that they're built on the test_runpy infrastructure, and one of the steps in that infrastructure is to add the new package path to sys.path so it can be imported.

This isn't an easy one to fix - you basically need something along the lines of a PEP 406 style import engine API in order to do the import without having potentially adverse effects on the state in the sys module.
msg165612 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-16 14:22
At the very least, the pkgutil docs need to state clearly that walk_packages only works properly with sys.path entries, and the constraint feature may not descend into packages correctly if an entry is shadowed by a sys.modules entry or an entry earlier on sys.meta_path or sys.path.
msg165618 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-07-16 14:35
I just realised this is going to behave strangely with namespace packages as well: the __import__ step will pick up *every* portion of the namespace package, not just those defined in the identified subset of sys.path.
msg165627 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-16 15:39
> This isn't an easy one to fix - you basically need something along the lines of a PEP 406 style import engine API in order to do the import without having potentially adverse effects on the state in the sys module.

By adverse, do you just mean side effects? If so, since the documentation doesn't explicitly say so, is there any reason for the user to think there shouldn't be side effects?  For example, I tried this in Python 2.7:

>>> import os, sys, pkgutil, unittest
>>> len(sys.modules)
86
>>> g = pkgutil.walk_packages([os.path.dirname(unittest.__file__)])
>>> len(sys.modules)
86
>>> for i in g:
...   pass
... 
>>> len(sys.modules)
95

Or maybe this isn't what you mean. If not, can you provide an example?
msg205021 - (view) Author: Martijn Faassen (faassen) Date: 2013-12-02 15:58
I just ran into this bug myself with namespace packages (in Python 2.7). When you have multiple packages (ns.a, ns.b) under a namespace package (ns), and constrain the paths in walk_packages so it should only pick up ns.a, it will pick up ns.b as well.

Any hope for a fix or workaround?
msg221986 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-30 21:32
Note that this is reference from #15358.
msg261589 - (view) Author: Andrey Nehaychik (Andrey Nehaychik) Date: 2016-03-11 18:14
Any hope to add the warning in pkgutil docs about this problem? 

For example:
Warning!!! The walk_packages function uses sys.path to import nested packages for provided paths. It means it walks deeply by relative import for subpackages. If you provide path that is not in sys.path as an argument the result won't be correct.
History
Date User Action Args
2020-01-29 00:16:41brett.cannonsetnosy: - brett.cannon
2016-03-11 21:04:33BreamoreBoysetnosy: - BreamoreBoy
2016-03-11 18:14:45Andrey Nehaychiksetnosy: + Andrey Nehaychik
messages: + msg261589
2014-12-09 17:45:54scorphussetnosy: + scorphus
2014-06-30 21:32:45BreamoreBoysetnosy: + BreamoreBoy

messages: + msg221986
versions: + Python 3.4, Python 3.5, - Python 3.2, Python 3.3
2013-12-02 15:58:02faassensetnosy: + faassen
messages: + msg205021
2012-07-16 15:39:22chris.jerdoneksetmessages: + msg165627
2012-07-16 14:35:17ncoghlansetmessages: + msg165618
2012-07-16 14:22:20ncoghlansetnosy: + docs@python
messages: + msg165612

assignee: docs@python
components: + Documentation
2012-07-16 13:53:43ncoghlansetmessages: + msg165605
2012-07-15 17:20:26chris.jerdoneksetmessages: + msg165537
2012-07-09 16:25:06brett.cannonsetnosy: brett.cannon, ncoghlan, eric.smith, eric.araujo, Arfrever, chris.jerdonek, gennad
messages: + msg165094
2012-07-07 15:51:42Arfreversetnosy: + Arfrever
2012-05-18 20:25:38eric.araujosetnosy: + brett.cannon, ncoghlan, eric.smith
2012-05-18 20:25:23eric.araujosetnosy: + eric.araujo
2012-05-12 13:32:23gennadsetnosy: + gennad

messages: + msg160469
versions: + Python 3.3
2012-05-12 09:01:10chris.jerdonekcreate