classification
Title: importlib.metadata documentation deficiencies
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jaraco Nosy List: barry, docs@python, indygreg, jaraco
Priority: normal Keywords: patch

Created on 2019-10-26 03:24 by indygreg, last changed 2019-12-11 02:45 by jaraco. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 17568 merged jaraco, 2019-12-10 23:33
PR 17569 merged miss-islington, 2019-12-11 01:05
Messages (7)
msg355402 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-10-26 03:24
As I was attempting to implement the find_distributions() interface for PyOxidizer, I got confused by importlib.metadata's documentation.

The documentation for this module states:

```
What this means in practice is that to support finding distribution package
metadata in locations other than the file system, you should derive from
``Distribution`` and implement the ``load_metadata()`` method. Then from
your finder, return instances of this derived ``Distribution`` in the
``find_distributions()`` method.
```

The reference to `load_metadata()` is the only occurrence of the string `load_metadata` in the CPython and importlib_metadata code bases. I therefore believe the documentation in both CPython and the importlib_metadata standalone package are wrong because they are referring to a method that is never implemented nor called.

Looking at the documentation and source code for importlib.metadata, I'm also a bit confused about how exactly I'm supposed to implement a custom Distribution which isn't based on filesystems. For example, I see that certain APIs return Path-like objects (which I will need to implement). But it isn't clear exactly which attributes are mandated to exist! Am I expected to implement the full pathlib.Path interface or just a subset?

Regarding how find_distributions() is called, I also don't understand why the Context is optional and how Context could be used in some situations. For example, the implementation of discover() can construct Context instances with no arguments, which is then fed into find_distributions(). So I guess context=None or context.name=None implies "return Distribution's for every known package?" If so, this behavior is undocumented.

I'm also not sure what Context.path is for. I /think/ it is only used for the path-based finder/distribution. But the way it is documented implies it should always exist, which doesn't seem appropriate for cases like PyOxidizer which will retrieve metadata from in-memory without filesystem I/O.

I think what I'm trying to say is that the existing documentation for importlib.metadata is not sufficient to robustly implement a custom find_distributions() + Distribution type. I would kindly request that a domain expert revise the documentation such that a 3rd party can implement a custom solution. My preferred solution would be for there to be formal interfaces in importlib.abc like there are for everything else in the importlib realm. (The interfaces for finders and loaders are super useful when implementing a finder/loader from scratch.)

FWIW I think I like the new metadata API and I think it is flexible enough to allow tools like PyOxidizer to do crazy things like divorce resources from the filesystem! But it is hard to say for sure since the interfaces aren't clearly defined at present.
msg358038 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-08 22:49
Good suggestions. Thanks for taking the time to articulate in such a friendly way the shortcomings you encountered. I'm happy to help.

In [this ticket](https://gitlab.com/python-devs/importlib_metadata/issues/105), I've mirrored this ticket in the backport project, where I can iterate much faster.

I'll provide brief answers to some of your questions/concerns here and then work out the wording for the documentation (and code changes) necessary to communicate that effectively.

> The reference to `load_metadata()` is the only occurrence of the string `load_metadata` in the CPython and importlib_metadata code bases. I therefore believe the documentation in both CPython and the importlib_metadata standalone package are wrong because they are referring to a method that is never implemented nor called.

That's right. The documentation is wrong. It should say to implement a `Distribution` subclass (especially its abstract methods). Nothing more should be necessary.

> I see that certain APIs return Path-like objects (which I will need to implement)

Are you sure about that? The only code I see in the `Distribution` class that references a `Path` object is `.at()`, a static method that we decided to add to the `Distribution` class even though it would be more appropriate in the `PathDistribution` class in order to make that method readily available to projects that wished to construct the Path-based distribution objects from a file system (or zipfile) path. I'm pretty sure everything else in the Distribution class relies on the two abstract methods. If you disregard `at` (and I recommend you do) and focus on implementing the abstract methods, I think things will work. Let me know if you find otherwise.

> why the Context is optional and how Context could be used?

The interface is intentionally vague in order not to be too prescriptive, because as you point out, name or path may not be relevant in some contexts. It's meant to narrow the scope of any search.

So if a path is present, that means the query is looking in a specific 'sys.path' entry. And if the name is present, that means it's looking for a distribution having a specific name. But basically, you can solicit any properties you like. You could expect a `size='max(100)'` parameter and only return distributions smaller than 100 (for whatever interpretation of `size` you wish to implement. Your DistributionFinder should do its best to honor whatever context might be relevant to the Distributions you provide.

Does PyOxidizer interact with `sys.path` at all? If not, it can disregard `Context.path`.
msg358041 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-08 23:06
Please have a look at https://gitlab.com/python-devs/importlib_metadata/merge_requests/104/diffs, which attempts to clarify the documentation to indicate how one would implement a custom finder. If you have a prototype implementation, I'd be happy to have a look.

The use-case you present is exactly the type of use-case this project wishes to enable, so I'm grateful that you're working on it and I'd like to do what I can to support the effort.
msg358234 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-10 23:28
I've merged the recommended changes into importlib_metadata 1.3 and I'm including those changes in issue39022.
msg358237 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-11 01:05
New changeset b7a0109cd2bafaa21a4d50aad307e901c68f9156 by Jason R. Coombs in branch 'master':
bpo-39022, bpo-38594: Sync with importlib_metadata 1.3 (GH-17568)
https://github.com/python/cpython/commit/b7a0109cd2bafaa21a4d50aad307e901c68f9156
msg358241 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-11 01:47
New changeset b738237d6792acba85b1f6e6c8993a812c7fd815 by Jason R. Coombs (Miss Islington (bot)) in branch '3.8':
bpo-39022, bpo-38594: Sync with importlib_metadata 1.3 (GH-17568) (GH-17569)
https://github.com/python/cpython/commit/b738237d6792acba85b1f6e6c8993a812c7fd815
msg358243 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-11 02:45
I'm hoping those documentation edits address the deficiencies, but if not, we can take another stab at it. Feel free to re-open as needed.
History
Date User Action Args
2019-12-11 02:45:25jaracosetstatus: open -> closed
resolution: fixed
messages: + msg358243

stage: patch review -> resolved
2019-12-11 01:47:13jaracosetmessages: + msg358241
2019-12-11 01:05:25miss-islingtonsetpull_requests: + pull_request17043
2019-12-11 01:05:19jaracosetmessages: + msg358237
2019-12-10 23:33:01jaracosetkeywords: + patch
stage: patch review
pull_requests: + pull_request17041
2019-12-10 23:28:29jaracosetmessages: + msg358234
2019-12-08 23:06:16jaracosetmessages: + msg358041
2019-12-08 22:49:10jaracosetmessages: + msg358038
2019-12-06 23:10:04jaracosetassignee: docs@python -> jaraco
2019-10-26 03:24:54indygregcreate