classification
Title: pkgutil.get_data() doesn't add subpackages to parent packages when importing
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, godlygeek, ncoghlan, pablogsal
Priority: normal Keywords:

Created on 2021-10-29 19:26 by godlygeek, last changed 2021-11-29 20:57 by godlygeek.

Messages (4)
msg405331 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-10-29 19:26
If a module hasn't yet been imported, `pkgutil.get_data(pkg_name, data_file)` will import it, but when it does, it doesn't add the submodule to its parent package when the parent package is a PEP 420 implicit namespace package.

```
$ mkdir -p namespace/package
$ touch namespace/package/__init__.py
$ echo data >namespace/package/data_file
$ python3.10 -c 'import namespace.package, pkgutil; print(pkgutil.get_data("namespace.package", "data_file")); import namespace; print(namespace.package)'
b'data\n'
<module 'namespace.package' from '/tmp/namespace/package/__init__.py'>
$ python3.10 -c 'import pkgutil; print(pkgutil.get_data("namespace.package", "data_file")); import namespace.package; import namespace; print(namespace.package)'
b'data\n'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'namespace' has no attribute 'package'
$
```

In this reproducer, we've got an implicit namespace package called "namespace" and a regular package inside it called "namespace.package". The regular package has a data file called "data_file".

If we import the regular package and then call pkgutil.get_data() to access the data file, it successfully retrieves the data file, and the module object for the namespace package winds up with an attribute referencing the module object for the regular package.

If we call pkgutil.get_data() to access the data file before importing the regular package, it successfully retrieves the data file, but the module object for the namespace package doesn't have an attribute referencing the module object for the regular package, even if we later do a normal import for the regular package.

It looks like pkgutil is importing the module when it hasn't already been imported (which I would expect) and adding it and any parent packages to sys.modules (which I would also expect), but then not adding submodules as attributes to their parent modules like `import` would (which seems like a bug).
msg405332 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-10-29 19:42
The original case where I encountered this was with a namespace package, but the behavior appears to be the same for a subpackage of a regular package.
msg405827 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-11-05 20:08
FYI the long-term plan is to deprecate pkgutil, so I would use newer APIs as provided by importlib.
msg407324 - (view) Author: Matt Wozniski (godlygeek) * Date: 2021-11-29 20:57
I wondered if it would be backwards compatible to make `pkgutil.get_data()` delegate to `importlib.resources.read_binary()`. It isn't, because `pkgutil.get_data()` accepts a relative path for the resource, and `importlib.resources.read_binary()` accepts only a filename. That is, you can do:

    pkgutil.get_data(__name__, "subdir/some_file")

but not:

    importlib.resources.read_binary(__name__, "subdir/some_file")

The latter fails with:

      File "/opt/bb/lib/python3.10/importlib/_common.py", line 34, in normalize_path
        raise ValueError(f'{path!r} must be only a file name')
History
Date User Action Args
2021-11-29 20:57:45godlygeeksetmessages: + msg407324
2021-11-05 20:08:42brett.cannonsetnosy: brett.cannon, ncoghlan, eric.snow, pablogsal, godlygeek
messages: + msg405827
2021-11-05 18:30:06eric.araujosetnosy: + brett.cannon, ncoghlan, eric.snow

versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8
2021-10-29 19:42:44godlygeeksetmessages: + msg405332
title: pkgutil.get_data() doesn't add subpackages to namespaces when importing -> pkgutil.get_data() doesn't add subpackages to parent packages when importing
2021-10-29 19:26:00godlygeekcreate