Issue42531
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-12-02 06:31 by William.Schwartz, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
path-test.py | William.Schwartz, 2020-12-02 06:31 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 23611 | merged | William.Schwartz, 2020-12-02 06:50 | |
PR 24229 | closed | miss-islington, 2021-01-16 17:23 |
Messages (8) | |||
---|---|---|---|
msg382297 - (view) | Author: William Schwartz (William.Schwartz) * | Date: 2020-12-02 06:31 | |
Suppose pkg is a package, it contains a resource r, pkg.__spec__.origin is None, and p = importlib.resources.path(pkg, r). Then p.__enter__() raises a TypeError in Python 3.7 and 3.8. (The problem has been fixed in 3.9). The error can be demonstrated by running the attached path-test.py. The tracebacks in 3.7 and 3.8 are nearly identical, so I'll just show the 3.8 traceback. 3.8.6 (default, Nov 20 2020, 18:29:40) [Clang 12.0.0 (clang-1200.0.32.27)] Traceback (most recent call last): File "path-test.py", line 19, in <module> p.__enter__() # Kaboom File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__ return next(self.gen) File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/resources.py", line 196, in path package_directory = Path(package.__spec__.origin).parent File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 1041, in __new__ self = cls._from_parts(args, init=False) File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 682, in _from_parts drv, root, parts = self._parse_args(args) File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 666, in _parse_args a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType The fix is super simple, as shown below. I'll submit this as a PR as well. diff --git a/Lib/importlib/resources.py b/Lib/importlib/resources.py index fc3a1c9cab..8d37d52cb8 100644 --- a/Lib/importlib/resources.py +++ b/Lib/importlib/resources.py @@ -193,9 +193,11 @@ def path(package: Package, resource: Resource) -> Iterator[Path]: _check_location(package) # Fall-through for both the lack of resource_path() *and* if # resource_path() raises FileNotFoundError. - package_directory = Path(package.__spec__.origin).parent - file_path = package_directory / resource - if file_path.exists(): + file_path = None + if package.__spec__.origin is not None: + package_directory = Path(package.__spec__.origin).parent + file_path = package_directory / resource + if file_path is not None and file_path.exists(): yield file_path else: with open_binary(package, resource) as fp: |
|||
msg382350 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2020-12-02 22:25 | |
Thanks William for the detailed problem description. If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior, a redesign that won't be ported back to Python 3.8 and earlier. In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions. In this case, you've identified a fairly surgical fix that may be suitable to apply to these older affected Pythons, so I'll consider that, but before I do, have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner. To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental. |
|||
msg382423 - (view) | Author: William Schwartz (William.Schwartz) * | Date: 2020-12-03 16:07 | |
> If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior That appears to be the case: path() shares code with files(). > a redesign that won't be ported back to Python 3.8 and earlier. Nor should it. > In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions. I do not need the files() API or anything else added in 3.9 at this time. I just need the existing 3.7/3.8 functionality to work as documented. > have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner. My application is sensitive to the size of the installed site-packages both in bytes and in just the number of packages. Yes, importlib_resources would very likely solve the problem reported in the OP. However I don't need the files() API, so adding an extra requirement feels like a heavy solution. > To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental. I agree there should be a high bar for patching old versions, but I posit that the behavior is an unintentional bug. In particular, I believe the behavior contradicts the documentation. Below I link and quote relevant documentation. The function in question: > importlib.resources.path(package, resource)¶ > ...package is either a name or a module object which conforms to the > Package requirements. https://docs.python.org/3.8/library/importlib.html#importlib.resources.path So we jump to Package: > importlib.resources.Package > The Package type is defined as Union[str, ModuleType]. This means that > where the function describes accepting a Package, you can pass in either a > string or a module. Module objects must have a resolvable > __spec__.submodule_search_locations that is not None. https://docs.python.org/3.8/library/importlib.html#importlib.resources.Package The Package type restricts the types of modules based on __spec__.submodule_search_locations. This suggests to me that the original author thought about which __spec__s to accept and which to reject but chose not to say anything about __spec__.origin, which is documented as possibly being None: > class importlib.machinery.ModuleSpec(...) > ...module.__spec__.origin == module.__file__.... Normally “origin” should > be set, but it may be None (the default) which indicates it is unspecified > (e.g. for namespace packages). https://docs.python.org/3.8/library/importlib.html#importlib.machinery.ModuleSpec.origin In particular, __spec__.origin *should* be None in certain situations: > __file__ > __file__ is optional.... The import system may opt to leave __file__ unset > if it has no semantic meaning (e.g. a module loaded from a database). https://docs.python.org/3.8/reference/import.html#__file__ Taken together, the foregoing passages describe an `import` API in which path() works for all modules that are packages (i.e., __spec__.submodule_search_locations is not None), and in which some packages' __spec__.origin is None. That path() fails in practice to support some packages is therefore a bug not a the absence of a feature. Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one? |
|||
msg384418 - (view) | Author: William Schwartz (William.Schwartz) * | Date: 2021-01-05 17:30 | |
@jaraco Did you have any other questions after my comments in msg382423? Do you think you or someone else could review PR 23611? Thanks! |
|||
msg384744 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2021-01-09 22:52 | |
> Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one? For that, please submit a PR to importlib_resources and it will get synced to CPython later. |
|||
msg384745 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2021-01-09 23:21 | |
Can you tell me more about the use-case that exhibited this undesirable behavior? That is, what loader is it that supports `open_binary` but not `is_resource` and doesn't have a `__origin__`? |
|||
msg384864 - (view) | Author: William Schwartz (William.Schwartz) * | Date: 2021-01-11 21:00 | |
> For that, please submit a PR to importlib_resources and it will get synced to CPython later. Will do once PR 23611 gets in shape. > Can you tell me more about the use-case that exhibited this undesirable behavior? Using the [PyOxidizer] "freezer". It compiles Python together with frozen Python code or byte code. > That is, what loader is it that supports `open_binary` but not `is_resource` and doesn't have a `__origin__`? PyOxidizer's [OxidizedImporter] importer. It [does not set `__file__`] (i.e., `__spec__.origin__ is None`). Its maintainer has resolved some ambiguities in the resources API contract (#36128) [differently from CPython], but I don't think that's related to the issue I ran into. The resource-related functionality of the importer is implemented here (extension module written in Rust): https://github.com/indygreg/PyOxidizer/blob/e86b2f46ed6b449bdb912900b0ac83576ad5ebe9/pyembed/src/importer.rs#L1078-L1269 [PyOxidizer]: https://pyoxidizer.readthedocs.io [OxidizedImporter]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer.html [does not set `__file__`]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer_behavior_and_compliance.html#file-and-cached-module-attributes [unsure about `ResourceReader` semantics]: https://pyoxidizer.readthedocs.io/en/v0.10.3/oxidized_importer_resource_files.html#resource-reader-support |
|||
msg385145 - (view) | Author: miss-islington (miss-islington) | Date: 2021-01-16 17:22 | |
New changeset f08c66467d56c71f75c6659ede9177449fe9d2e6 by William Schwartz in branch '3.8': [3.8] bpo-42531: Teach importlib.resources.path to handle packages without __file__ (GH-23611) https://github.com/python/cpython/commit/f08c66467d56c71f75c6659ede9177449fe9d2e6 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:38 | admin | set | github: 86697 |
2021-01-16 17:54:28 | jaraco | set | versions: - Python 3.7 |
2021-01-16 17:24:48 | jaraco | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2021-01-16 17:23:09 | miss-islington | set | pull_requests: + pull_request23052 |
2021-01-16 17:22:30 | miss-islington | set | nosy:
+ miss-islington messages: + msg385145 |
2021-01-11 21:00:15 | William.Schwartz | set | messages: + msg384864 |
2021-01-09 23:21:01 | jaraco | set | messages: + msg384745 |
2021-01-09 22:52:14 | jaraco | set | messages: + msg384744 |
2021-01-05 17:30:19 | William.Schwartz | set | messages: + msg384418 |
2020-12-03 16:07:46 | William.Schwartz | set | messages: + msg382423 |
2020-12-02 22:25:09 | jaraco | set | messages: + msg382350 |
2020-12-02 17:53:19 | brett.cannon | set | nosy:
+ barry, jaraco, - brett.cannon |
2020-12-02 06:50:34 | William.Schwartz | set | keywords:
+ patch stage: patch review pull_requests: + pull_request22477 |
2020-12-02 06:31:42 | William.Schwartz | create |