Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importlib.resources.path() raises TypeError for packages without __file__ #86697

Closed
wkschwartz mannequin opened this issue Dec 2, 2020 · 8 comments
Closed

importlib.resources.path() raises TypeError for packages without __file__ #86697

wkschwartz mannequin opened this issue Dec 2, 2020 · 8 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@wkschwartz
Copy link
Mannequin

wkschwartz mannequin commented Dec 2, 2020

BPO 42531
Nosy @warsaw, @jaraco, @wkschwartz, @miss-islington
PRs
  • [3.8] bpo-42531: Teach importlib.resources.path to handle packages without __file__ #23611
  • [3.7] [3.8] bpo-42531: Teach importlib.resources.path to handle packages without __file__ (GH-23611) #24229
  • Files
  • path-test.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-01-16.17:24:48.243>
    created_at = <Date 2020-12-02.06:31:42.364>
    labels = ['3.8', 'type-bug', 'library']
    title = 'importlib.resources.path() raises TypeError for packages without __file__'
    updated_at = <Date 2021-01-16.17:54:28.784>
    user = 'https://github.com/wkschwartz'

    bugs.python.org fields:

    activity = <Date 2021-01-16.17:54:28.784>
    actor = 'jaraco'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-01-16.17:24:48.243>
    closer = 'jaraco'
    components = ['Library (Lib)']
    creation = <Date 2020-12-02.06:31:42.364>
    creator = 'William.Schwartz'
    dependencies = []
    files = ['49643']
    hgrepos = []
    issue_num = 42531
    keywords = ['patch']
    message_count = 8.0
    messages = ['382297', '382350', '382423', '384418', '384744', '384745', '384864', '385145']
    nosy_count = 4.0
    nosy_names = ['barry', 'jaraco', 'William.Schwartz', 'miss-islington']
    pr_nums = ['23611', '24229']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42531'
    versions = ['Python 3.8']

    @wkschwartz
    Copy link
    Mannequin Author

    wkschwartz mannequin commented Dec 2, 2020

    Suppose pkg is a package, it contains a resource r, pkg.__spec__.origin is None, and p = importlib.resources.path(pkg, r). Then p.__enter__() raises a TypeError in Python 3.7 and 3.8. (The problem has been fixed in 3.9). The error can be demonstrated by running the attached path-test.py. The tracebacks in 3.7 and 3.8 are nearly identical, so I'll just show the 3.8 traceback.

    3.8.6 (default, Nov 20 2020, 18:29:40) 
    [Clang 12.0.0 (clang-1200.0.32.27)]
    Traceback (most recent call last):
      File "path-test.py", line 19, in <module>
        p.__enter__()  # Kaboom
      File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
        return next(self.gen)
      File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/resources.py", line 196, in path
        package_directory = Path(package.__spec__.origin).parent
      File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 1041, in __new__
        self = cls._from_parts(args, init=False)
      File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 682, in _from_parts
        drv, root, parts = self._parse_args(args)
      File "/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 666, in _parse_args
        a = os.fspath(a)
    TypeError: expected str, bytes or os.PathLike object, not NoneType

    The fix is super simple, as shown below. I'll submit this as a PR as well.

    diff --git a/Lib/importlib/resources.py b/Lib/importlib/resources.py
    index fc3a1c9cab..8d37d52cb8 100644
    --- a/Lib/importlib/resources.py
    +++ b/Lib/importlib/resources.py
    @@ -193,9 +193,11 @@ def path(package: Package, resource: Resource) -> Iterator[Path]:
             _check_location(package)
         # Fall-through for both the lack of resource_path() *and* if
         # resource_path() raises FileNotFoundError.
    -    package_directory = Path(package.__spec__.origin).parent
    -    file_path = package_directory / resource
    -    if file_path.exists():
    +    file_path = None
    +    if package.__spec__.origin is not None:
    +        package_directory = Path(package.__spec__.origin).parent
    +        file_path = package_directory / resource
    +    if file_path is not None and file_path.exists():
             yield file_path
         else:
             with open_binary(package, resource) as fp:

    @wkschwartz wkschwartz mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 2, 2020
    @jaraco
    Copy link
    Member

    jaraco commented Dec 2, 2020

    Thanks William for the detailed problem description.

    If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior, a redesign that won't be ported back to Python 3.8 and earlier. In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions.

    In this case, you've identified a fairly surgical fix that may be suitable to apply to these older affected Pythons, so I'll consider that, but before I do, have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner.

    To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental.

    @wkschwartz
    Copy link
    Mannequin Author

    wkschwartz mannequin commented Dec 3, 2020

    If the issue has been fixed on Python 3.9 but not on 3.8, then it was likely a redesign that enabled the improved behavior

    That appears to be the case: path() shares code with files().

    a redesign that won't be ported back to Python 3.8 and earlier.

    Nor should it.

    In these situations, the best recommendation is often to just rely on importlib_resources (the backport) for those older Python versions.

    I do not need the files() API or anything else added in 3.9 at this time. I just need the existing 3.7/3.8 functionality to work as documented.

    have you considered using importlib_resources for Python 3.8 and earlier? That would likely also address the issue and you could adopt it sooner.

    My application is sensitive to the size of the installed site-packages both in bytes and in just the number of packages. Yes, importlib_resources would very likely solve the problem reported in the OP. However I don't need the files() API, so adding an extra requirement feels like a heavy solution.

    To some extent, the behavior you've described could be considered a bug or could be considered a feature request (add support for path on packages that have no __spec__.origin). I am not aware whether this limitation was by design or incidental.

    I agree there should be a high bar for patching old versions, but I posit that the behavior is an unintentional bug. In particular, I believe the behavior contradicts the documentation. Below I link and quote relevant documentation.

    The function in question:

    importlib.resources.path(package, resource)¶
    ...package is either a name or a module object which conforms to the
    Package requirements.
    https://docs.python.org/3.8/library/importlib.html#importlib.resources.path

    So we jump to Package:

    importlib.resources.Package
    The Package type is defined as Union[str, ModuleType]. This means that
    where the function describes accepting a Package, you can pass in either a
    string or a module. Module objects must have a resolvable
    __spec__.submodule_search_locations that is not None.
    https://docs.python.org/3.8/library/importlib.html#importlib.resources.Package

    The Package type restricts the types of modules based on __spec__.submodule_search_locations. This suggests to me that the original author thought about which __spec__s to accept and which to reject but chose not to say anything about __spec__.origin, which is documented as possibly being None:

    class importlib.machinery.ModuleSpec(...)
    ...module.__spec__.origin == module.__file__.... Normally “origin” should
    be set, but it may be None (the default) which indicates it is unspecified
    (e.g. for namespace packages).
    https://docs.python.org/3.8/library/importlib.html#importlib.machinery.ModuleSpec.origin

    In particular, __spec__.origin *should* be None in certain situations:

    __file__
    __file__ is optional.... The import system may opt to leave __file__ unset
    if it has no semantic meaning (e.g. a module loaded from a database).
    https://docs.python.org/3.8/reference/import.html#__file__

    Taken together, the foregoing passages describe an import API in which path() works for all modules that are packages (i.e., spec.submodule_search_locations is not None), and in which some packages' spec.origin is None. That path() fails in practice to support some packages is therefore a bug not a the absence of a feature.

    Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one?

    @wkschwartz
    Copy link
    Mannequin Author

    wkschwartz mannequin commented Jan 5, 2021

    @jaraco Did you have any other questions after my comments in msg382423? Do you think you or someone else could review PR 23611? Thanks!

    @jaraco
    Copy link
    Member

    jaraco commented Jan 9, 2021

    Regardless of whether PR 23611 is accepted, the test that it adds should be added to Python master to guard against regressions. I can submit that as a separate PR. Before I do that, do I need to create a new bpo ticket, or can I just reference this one?

    For that, please submit a PR to importlib_resources and it will get synced to CPython later.

    @jaraco
    Copy link
    Member

    jaraco commented Jan 9, 2021

    Can you tell me more about the use-case that exhibited this undesirable behavior? That is, what loader is it that supports open_binary but not is_resource and doesn't have a __origin__?

    @wkschwartz
    Copy link
    Mannequin Author

    wkschwartz mannequin commented Jan 11, 2021

    For that, please submit a PR to importlib_resources and it will get synced to CPython later.

    Will do once PR 23611 gets in shape.

    Can you tell me more about the use-case that exhibited this undesirable behavior?

    Using the PyOxidizer "freezer". It compiles Python together with frozen Python code or byte code.

    That is, what loader is it that supports open_binary but not is_resource and doesn't have a __origin__?

    PyOxidizer's OxidizedImporter importer. It does not set __file__ (i.e., __spec__.origin__ is None). Its maintainer has resolved some ambiguities in the resources API contract (bpo-36128) [differently from CPython], but I don't think that's related to the issue I ran into. The resource-related functionality of the importer is implemented here (extension module written in Rust): https://github.com/indygreg/PyOxidizer/blob/e86b2f46ed6b449bdb912900b0ac83576ad5ebe9/pyembed/src/importer.rs#L1078-L1269

    @miss-islington
    Copy link
    Contributor

    New changeset f08c664 by William Schwartz in branch '3.8':
    [3.8] bpo-42531: Teach importlib.resources.path to handle packages without __file__ (GH-23611)
    f08c664

    @jaraco jaraco closed this as completed Jan 16, 2021
    @jaraco jaraco closed this as completed Jan 16, 2021
    @jaraco jaraco removed 3.7 (EOL) end of life labels Jan 16, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants