classification
Title: importlib.resources.path() raises TypeError for packages without __file__
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: William.Schwartz, barry, jaraco, miss-islington
Priority: normal Keywords: patch

Created on 2020-12-02 06:31 by William.Schwartz, last changed 2021-01-16 17:54 by jaraco. This issue is now closed.

Files
File name Uploaded Description Edit
path-test.py William.Schwartz, 2020-12-02 06:31
Pull Requests
URL Status Linked Edit
PR 23611 merged William.Schwartz, 2020-12-02 06:50
PR 24229 closed miss-islington, 2021-01-16 17:23
Messages (8)
msg382297 - (view) Author: William Schwartz (William.Schwartz) * Date: 2020-12-02 06:31
Suppose pkg is a package, it contains a resource r, pkg.__spec__.origin is None, and p = importlib.resources.path(pkg, r). Then p.__enter__() raises a TypeError in Python 3.7 and 3.8. (The problem has been fixed in 3.9). The error can be demonstrated by running the attached path-test.py. The tracebacks in 3.7 and 3.8 are nearly identical, so I'll just show the 3.8 traceback.

3.8.6 (default, Nov 20 2020, 18:29:40) 
[Clang 12.0.0 (clang-1200.0.32.27)]
Traceback (most recent call last):
  File "path-test.py", line 19, in <module>
    p.__enter__()  # Kaboom
  File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/resources.py", line 196, in path
    package_directory = Path(package.__spec__.origin).parent
  File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 1041, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 682, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 666, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

The fix is super simple, as shown below. I'll submit this as a PR as well.

diff --git a/Lib/importlib/resources.py b/Lib/importlib/resources.py
index fc3a1c9cab..8d37d52cb8 100644
--- a/Lib/importlib/resources.py
+++ b/Lib/importlib/resources.py
@@ -193,9 +193,11 @@ def path(package: Package, resource: Resource) -> Iterator[Path]:
         _check_location(package)
     # Fall-through for both the lack of resource_path() *and* if
     # resource_path() raises FileNotFoundError.
-    package_directory = Path(package.__spec__.origin).parent
-    file_path = package_directory / resource
-    if file_path.exists():
+    file_path = None
+    if package.__spec__.origin is not None:
+        package_directory = Path(package.__spec__.origin).parent
+        file_path = package_directory / resource
+    if file_path is not None and file_path.exists():
         yield file_path
     else:
         with open_binary(package, resource) as fp:
msg382350 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-12-02 22:25
Thanks William for the detailed problem description.

If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior, a redesign that won't be ported back to Python 3.8 and earlier. In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions.

In this case, you've identified a fairly surgical fix that may be suitable to apply to these older affected Pythons, so I'll consider that, but before I do, have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner.

To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental.
msg382423 - (view) Author: William Schwartz (William.Schwartz) * Date: 2020-12-03 16:07
> If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior

That appears to be the case: path() shares code with files().

> a redesign that won't be ported back to Python 3.8 and earlier.

Nor should it.

> In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions.

I do not need the files() API or anything else added in 3.9 at this time. I just need the existing 3.7/3.8 functionality to work as documented.

> have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner.

My application is sensitive to the size of the installed site-packages both in bytes and in just the number of packages. Yes, importlib_resources would very likely solve the problem reported in the OP. However I don't need the files() API, so adding an extra requirement feels like a heavy solution.

> To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental.

I agree there should be a high bar for patching old versions, but I posit that the behavior is an unintentional bug. In particular, I believe the behavior contradicts the documentation. Below I link and quote relevant documentation.

The function in question:

> importlib.resources.path(package, resource)¶
>     ...package is either a name or a module object which conforms to the
>     Package requirements.
https://docs.python.org/3.8/library/importlib.html#importlib.resources.path

So we jump to Package:

> importlib.resources.Package
>     The Package type is defined as Union[str, ModuleType]. This means that
>     where the function describes accepting a Package, you can pass in either a
>     string or a module. Module objects must have a resolvable
>     __spec__.submodule_search_locations that is not None.
https://docs.python.org/3.8/library/importlib.html#importlib.resources.Package


The Package type restricts the types of modules based on __spec__.submodule_search_locations. This suggests to me that the original author thought about which __spec__s to accept and which to reject but chose not to say anything about __spec__.origin, which is documented as possibly being None:

> class importlib.machinery.ModuleSpec(...)
>     ...module.__spec__.origin == module.__file__.... Normally “origin” should
>     be set, but it may be None (the default) which indicates it is unspecified
>     (e.g. for namespace packages).
https://docs.python.org/3.8/library/importlib.html#importlib.machinery.ModuleSpec.origin

In particular, __spec__.origin *should* be None in certain situations:

> __file__
>     __file__ is optional.... The import system may opt to leave __file__ unset
>     if it has no semantic meaning (e.g. a module loaded from a database).
https://docs.python.org/3.8/reference/import.html#__file__

Taken together, the foregoing passages describe an `import` API in which path() works for all modules that are packages (i.e., __spec__.submodule_search_locations is not None), and in which some packages' __spec__.origin is None. That path() fails in practice to support some packages is therefore a bug not a the absence of a feature.

Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one?
msg384418 - (view) Author: William Schwartz (William.Schwartz) * Date: 2021-01-05 17:30
@jaraco Did you have any other questions after my comments in msg382423? Do you think you or someone else could review PR 23611? Thanks!
msg384744 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-01-09 22:52
> Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one?

For that, please submit a PR to importlib_resources and it will get synced to CPython later.
msg384745 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-01-09 23:21
Can you tell me more about the use-case that exhibited this undesirable behavior? That is, what loader is it that supports `open_binary` but not `is_resource` and doesn't have a `__origin__`?
msg384864 - (view) Author: William Schwartz (William.Schwartz) * Date: 2021-01-11 21:00
> For that, please submit a PR to importlib_resources and it will get synced to CPython later.

Will do once PR 23611 gets in shape. 

> Can you tell me more about the use-case that exhibited this undesirable behavior?

Using the [PyOxidizer] "freezer". It compiles Python together with frozen Python code or byte code.

> That is, what loader is it that supports `open_binary` but not `is_resource` and doesn't have a `__origin__`?

PyOxidizer's [OxidizedImporter] importer. It [does not set `__file__`] (i.e., `__spec__.origin__ is None`). Its maintainer has resolved some ambiguities in the resources API contract (#36128) [differently from CPython], but I don't think that's related to the issue I ran into. The resource-related functionality of the importer is implemented here (extension module written in Rust): https://github.com/indygreg/PyOxidizer/blob/e86b2f46ed6b449bdb912900b0ac83576ad5ebe9/pyembed/src/importer.rs#L1078-L1269

[PyOxidizer]: https://pyoxidizer.readthedocs.io
[OxidizedImporter]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer.html
[does not set `__file__`]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer_behavior_and_compliance.html#file-and-cached-module-attributes
[unsure about `ResourceReader` semantics]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer_resource_files.html#resource-reader-support
msg385145 - (view) Author: miss-islington (miss-islington) Date: 2021-01-16 17:22
New changeset f08c66467d56c71f75c6659ede9177449fe9d2e6 by William Schwartz in branch '3.8':
[3.8] bpo-42531: Teach importlib.resources.path to handle packages without __file__ (GH-23611)
https://github.com/python/cpython/commit/f08c66467d56c71f75c6659ede9177449fe9d2e6
History
Date User Action Args
2021-01-16 17:54:28jaracosetversions: - Python 3.7
2021-01-16 17:24:48jaracosetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-01-16 17:23:09miss-islingtonsetpull_requests: + pull_request23052
2021-01-16 17:22:30miss-islingtonsetnosy: + miss-islington
messages: + msg385145
2021-01-11 21:00:15William.Schwartzsetmessages: + msg384864
2021-01-09 23:21:01jaracosetmessages: + msg384745
2021-01-09 22:52:14jaracosetmessages: + msg384744
2021-01-05 17:30:19William.Schwartzsetmessages: + msg384418
2020-12-03 16:07:46William.Schwartzsetmessages: + msg382423
2020-12-02 22:25:09jaracosetmessages: + msg382350
2020-12-02 17:53:19brett.cannonsetnosy: + barry, jaraco, - brett.cannon
2020-12-02 06:50:34William.Schwartzsetkeywords: + patch
stage: patch review
pull_requests: + pull_request22477
2020-12-02 06:31:42William.Schwartzcreate