3.10 beta 1: breaking change in importlib.metadata entry points #88412

asottile · 2021-05-27T15:30:11Z

BPO	44246
Nosy	@warsaw, @jaraco, @zooba, @asottile, @hroncok, @Zac-HD, @miss-islington, @domdfcoding, @Fidget-Spinner
PRs	bpo-44246: Update What's New for importlib.metadata. #26408 [3.10] bpo-44246: Update What's New for importlib.metadata. (GH-26408) #26415 bpo-44246: Entry points performance improvements. #26467 bpo-44246: Restore compatibility in entry_points #26468 [3.10] bpo-44246: Entry points performance improvements. (GH-26467) #26469 [3.10] bpo-44246: Restore compatibility in entry_points (GH-26468) #26471 bpo-44246: Remove note about access by index now that a compatibility shim is offered. #26472 [3.10] bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472) #26473

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-06-27.21:42:26.609>
created_at = <Date 2021-05-27.15:30:11.114>
labels = ['type-security', 'build', '3.10', '3.11']
title = '3.10 beta 1: breaking change in importlib.metadata entry points'
updated_at = <Date 2021-06-28.18:05:20.022>
user = 'https://github.com/asottile'

bugs.python.org fields:

activity = <Date 2021-06-28.18:05:20.022>
actor = 'barry'
assignee = 'none'
closed = True
closed_date = <Date 2021-06-27.21:42:26.609>
closer = 'jaraco'
components = ['Build']
creation = <Date 2021-05-27.15:30:11.114>
creator = 'Anthony Sottile'
dependencies = []
files = []
hgrepos = []
issue_num = 44246
keywords = ['patch']
message_count = 45.0
messages = ['394548', '394550', '394552', '394553', '394557', '394559', '394560', '394588', '394616', '394620', '394625', '394726', '394727', '394728', '394749', '394767', '394769', '394770', '394773', '394774', '394775', '394777', '394778', '394806', '394807', '394811', '394812', '394813', '394814', '394815', '394816', '394830', '395202', '395207', '395282', '395301', '395333', '395360', '395361', '395495', '395500', '395531', '396056', '396081', '396605']
nosy_count = 10.0
nosy_names = ['barry', 'jaraco', 'steve.dower', 'Anthony Sottile', 'hroncok', 'Zac Hatfield-Dodds', 'miss-islington', 'domdfcoding', 'kj', 'hectorizdaone']
pr_nums = ['26408', '26415', '26467', '26468', '26469', '26471', '26472', '26473']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue44246'
versions = ['Python 3.10', 'Python 3.11']

asottile · 2021-05-27T15:30:11Z

this is breaking code that's unfortunately out of my control (vendor) -- also it looks really wrong

import importlib.metadata
print('looks like a list:')
print(importlib.metadata.distribution('pip').entry_points)
print('first item:')
print(importlib.metadata.distribution('pip').entry_points[0])

output in 3.9:

$ ./venv39/bin/python t.py
looks like a list:
[EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3.8', value='pip._internal.cli.main:main', group='console_scripts')]
first item:
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')

$ venv310/bin/python t.py
looks like a list:
(EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3.8', value='pip._internal.cli.main:main', group='console_scripts'))
first item:
Traceback (most recent call last):
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 217, in __getitem__
    return next(iter(self.select(name=name)))
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/y/t.py", line 5, in <module>
    print(importlib.metadata.distribution('pip').entry_points[0])
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 219, in __getitem__
    raise KeyError(name)
KeyError: 0

tirkarthi · 2021-05-27T15:37:29Z

Seems this was reported in python/importlib_metadata#300 . Closed in python/importlib_metadata@5ca9bc7.

jaraco · 2021-05-27T15:48:14Z

This backward incompatibility was unintentionally introduced in importlib_metadata 3.6 (https://importlib-metadata.readthedocs.io/en/latest/history.html#v3-6-0, released Feb 23) and was previously reported in python/importlib_metadata#300. While technically it's a breaking change, here's my analysis from that issue:

Basically what it boils down to is that access by index of a Distribution.entry_points was dropped, but that appears not to be a problem. At least, this is the first report of such a problem. The assumption has been, and the tests bear this out implicitly, that the consumer of Distribution.entry_points will be iterated over and not accessed by index.

My assessment is that the issue is theoretically incompatible but in practice, this usage is not found in the wild. If there was a substantial need to maintain compatibility for this use-case, I'd definitely consider adding such compatibility. As it stands, however, I'm unaware of even a single use case that demands this interface.

Given the non-issue this has been for importlib_metadata, I expect it to be a non-issue for Python 3.10 as well.

asottile · 2021-05-27T15:49:16Z

the "what's new" mentions nothing of this break, nor the aggressive deprecation warnings, nor the various shifted interfaces (such as this one): https://docs.python.org/3.10/whatsnew/3.10.html#importlib-metadata

jaraco · 2021-05-27T15:54:57Z

Yes, perhaps the What's New could be refreshed.

There is a Compatibility Note in the docs for entry_points about the deprecated usage (https://docs.python.org/3.10/library/importlib.metadata.html?highlight=importlib%20metadata#entry-points).

I want the What's New to be only those aspects that are particularly salient to the users, so I probably would not even mention the compatibility change reported above unless I expected it to affect users, but I wouldn't be opposed to more thorough messaging if you believe that would address the concern.

asottile · 2021-05-27T16:03:15Z

personally I think they should be reverted -- they were not committed in spirit with the backwards compatibility policy: https://www.python.org/dev/peps/pep-0387/#making-incompatible-changes

I don't think they were discussed thoroughly, and when opposition was presented it was not listened to thoroughly: Feature/entry points by group and name importlib_metadata#278
the change significantly complicates importlib.metadata with lots of sneaky types (they all look like builtin types but do not act like them)
it simultaneously introduces new apis and old apis which will both be around for extended periods of time but immediately moves to DeprecationWarning
the new apis aren't remarkably better than the old apis -- the motivation was written as "typing issues" but without citing actual issues. in fact, the new items are significantly more complicated to type properly
the change breaks many significantly important projects, from perusing related issues it's at the very least flake8, pandas, virtualenv, astropy, pytest, hypothesis -- and hundreds more from a quick github code search

jaraco · 2021-05-27T16:06:23Z

In the What's New, I used :func:`importlib.metadata.entry_points`, but that doesn't seem to resolve to the docs for the function. I need to figure out how to link to the entry_points anchor that's there.

jaraco · 2021-05-27T19:00:33Z

Although it feels the topic has shifted from the original concern (an unintentional incompatibility) to the broader topic of the API change to entry_points generally, I'm happy to address your comments:

I don't think they were discussed thoroughly, and when opposition was presented it was not listened to thoroughly: Feature/entry points by group and name importlib_metadata#278

Anyone reading that thread will see that I was responsive to your concerns, adapted the solution based on your concerns, and spent extra energy documenting the motivations for the change and exploring solutions until I came up with something that I believed would address the concerns and has since largely borne out that goal in the release.

the change significantly complicates importlib.metadata with lots of sneaky types (they all look like builtin types but do not act like them)

The solution that I ultimately settled on in python/importlib_metadata#278 does not use "sneaky" or "cutsey" types, but in fact reduces the number of magic types. EntryPoints is a tuple subclass, EntryPoint is a namedtuple subclass (with __iter__ magic deprecated), and SelectableGroups is a compatibility shim to be removed. Which types are sneaky and look like built-in types but do not act like them?

it simultaneously introduces new apis and old apis which will both be around for extended periods of time but immediately moves to DeprecationWarning

Given that backports are available, I saw no strong reason to delay the DeprecationWarning. I was open to the possibility that the transition could prove too taxing and the deprecation would have to be delayed. This early exposure means that most projects will already have addressed the deprecation concerns prior to the release of Python 3.10. In python/importlib_metadata#298, the SQLAlchemy team helped me understand a nuanced use-case I hadn't considered, where libraries don't necessarily have the luxury of requiring newer backport releases, but in that case, I developed a solution that would provide future compatibility even with older stdlib and backport releases (through backports.entry_points_selectable). As far as I understand, there are no known use-cases that aren't satisfied by this design.

the new apis aren't remarkably better than the old apis -- the motivation was written as "typing issues" but without citing actual issues. in fact, the new items are significantly more complicated to type properly

The typing issues were discussed here (pypa/twine#728 (review)) in the PR referenced by python/importlib_metadata#278. The magic casting of a two tuple to an item of a dict was found to be incompatible and unsupported by mypy (python/mypy#9938). I further expanded on the motivations that led to this approach in python/importlib_metadata#282 and python/importlib_metadata#284.

So while Twine is able to declare types properly using this new design, it was infeasible to do so in the old design. After the compatibility layers are removed, entry_points will simply return an EntryPoints object, which presents an iterable of EntryPoint objects but with some facilities for selection. It's straightforward and clean. Please demonstrate the complication you see with the current approach.

The new APIs are not only easier to describe with types, but they are easier to describe in documentation.

the change breaks many significantly important projects, from perusing related issues it's at the very least flake8, pandas, virtualenv, astropy, pytest, hypothesis -- and hundreds more from a quick github code search

I'm not aware of a single breakage. The code is compatible for all known use-cases, but does present a DeprecationWarning in some usages. In the case of flake8, I've proposed a solution to avoid the DeprecationWarning and move to the preferred design, even without requiring an updated importlib_metadata backport.

I'm invested in providing as seamless a transition as possible to all projects, and I believe this change accomplishes that. Please let me know if there's a project or application where that's not the case.

personally I think [all API changes] should be reverted

Given the amount of adoption already, reverting these changes is likely to cause more disruption than moving forward with them.

If you would like to see the changes reverted or adapted further, please provide an example of a use-case that's broken by the current approach. So far, the only example presented above appears contrived and not an actual concern. It's common for Python minor releases to introduce undocumented changes that are technically incompatible but in ways that affect few or no users.

jaraco · 2021-05-27T23:36:21Z

New changeset 28f12c9 by Jason R. Coombs in branch 'main':
bpo-44246: Update What's New for importlib.metadata. (bpo-26408)
28f12c9

miss-islington · 2021-05-27T23:57:12Z

New changeset 59f9594 by Miss Islington (bot) in branch '3.10':
[3.10] bpo-44246: Update What's New for importlib.metadata. (GH-26408) (GH-26415)
59f9594

asottile · 2021-05-28T01:27:35Z

Which types are sneaky and look like built-in types but do not act like them?

well for starters, there's the tuple subclass which pretends to be a dict. but it violates substitutability for both tuple and Mapping so it's not useful in either contexts. mypy complains about incorrect types in overrides for both. the worst part of this is that the __getitem__ moves from O(1) to O(N) (in some private code this makes importlib.metadata on 3.10 10x slower than on 3.9). next there's the EntryPoints tuple subclass which looks like a tuple but doesn't at all act like one (getitem fails substitutability for example) -- this is an api break with 3.9 which returned a list (can't .sort() .extend(...), etc. any more)

Given that backports are available

I don't think this is appropriate. re-introducing a backport brings in a tree of dependencies that have been shaky at best with backward compatibility. in other words, using standard library importlib.metadata provides significantly improved compatibility and stability over the moving target backport (and its tree of dependencies, zipp being the one that breaks the most often from experience). you'll notice I closed the flake8 PRs specifically because I didn't want to reintroduce the backport. This backport also globally monkeypatches the import machinery breaking any other consumer as well -- in a tool as popular as flake8 I can't really make that global mutation decision for all of the other consumers.

The typing issues

these were trivially solved by a dictionary comprehension using entrypoint.name -- it really did not need a full rework and break of the api to solve (could have just deprecated the __iter__ which I actually suggested on the original implementation way back in 3.8)

The new APIs are not only easier to describe with types

the types describing the new apis require significant # type: ignores to pass mypy because they violate basic substitutability. they also cannot be used in any of the contexts they were appropriate for in <3.10 (Dict[str, List[EntryPoint]] or List[Entrypoint] depending on the api).

I'm not aware of a single breakage.

I'm sorry but you have to have realized from the many issues on importlib-metadata or the many issues linking to your deprecation issue that there is significant toil caused directly by your change. cpython is meant to be a stable substrate to build upon, please do not force the community to shoulder the burden of your poor api decisions.

even without requiring an updated importlib_metadata backport.

your proposed change introduced a different, unrelated package. not without its own maintenance problems (an additional dependency that has to ~work indefinitely, a hack at best to support this breaking change)

Given the amount of adoption already, reverting these changes is likely to cause more disruption than moving forward with them.

I disagree, bigger things have been reverted (see: __future__.annotations)

So far, the only example presented above appears contrived and not an actual concern

I promise you this is not a contrived case, if you look at your issue tracker it has been reported before and by others. For every issue reported there's likely tens or hundreds of others which are not reported.

It's common for Python minor releases to introduce undocumented changes that are technically incompatible

I've seen this as a rationalization for intentional surprise breaking changes but I don't buy it.

Additionally your comments about (paraphrased) "the testsuite didn't demonstrate this usecase so I'm free to change it" are frankly a cop out -- your api returned a dict and a list, that's part of the public api -- changing that is a breaking change.

gaborbernat · 2021-05-29T09:17:42Z

After reading through the points here, I must say I agree with Anthony here. The standard library has clear rules regarding how previously working interfaces should be deprecated, and this changeset is violating those. At no point was documented that relying on the list/dict trait of the existing interface is not part of the interface. I don't think the importlib libraries are special enough to warrant exclusion from this rule (as opposed let's say the zoneinfo).

jaraco · 2021-05-29T09:29:12Z

there's the tuple subclass which pretends to be a dict.

There's no tuple subclass that's pretending to be a dict. It overrides __getitem__ for convenience. It never claims to support Mapping.

mypy complains about incorrect types in overrides for both.

I'm unsure what the concern is. If there's an issue, it hasn't been reported to the project. importlib_metadata runs mypy tests with the test suite (all passing) and twine uses the API with strict mypy checking enabled without any exclusions.

the worst part of this is that the __getitem__ moves from O(1) to O(N) (in some private code this makes importlib.metadata on 3.10 10x slower than on 3.9)

This issue was revealed during the review and I acknowledged the concern and agreed to address the issue if it mattered. This project has demonstrated its concern for performance issues as are apparent through a number of optimizations in the changelog. In every use case I've seen, the performance is improved by the current approach (a group/sort operation is avoided). If the performance is a concern, I once again welcome a bug report describing the use-case and the impact, though I suspect it's an isolated case and likely would best be addressed outside the official codebase.

this is an api break with 3.9 which returned a list

I acknowledge this break, though I believe the concerns are overblown. The API specifically sought to reduce dependence on receiving a list and instead to provide a more abstract collection.

I don't think [introducing behavior in backports] is appropriate.

It's true, the "backport" monkier is a false one here. From the very beginning, these modules first introduced their behavior outside the stdlib and were then ported into CPython. Moreover, the past couple of years have seen substantial refinement and innovation and was able to move much faster and reach stability much faster and with wider adoption than if the library had followed the stardard Python development cadence.

It's quite likely that this project will eventually stabilize to the point that most users do not need the backport, but while it exists, it's providing massive value. Consider the most recent example (https://importlib-metadata.readthedocs.io/en/latest/history.html#v4-3-1) where a performance improvement caused a regression. The regression was detected and fixed within a day. Now when CPython adopts that behavior, we can all have higher confidence in the viability of the implementation.

It would be a pretty big shift to block this approach, but it's not out of the realm of consideration. Still, it's out of scope for this discussion. Feel free to raise it separately.

[the typing issues] were *trivially solved* by a dictionary comprehension

No such solution was proposed by anybody, but more importantly, I don't believe the solution would have been so trivial and still met the objectives.

the types describing the new apis require significant # type: ignores to pass mypy because they violate basic substitutability.

I'm unaware of this issue and it's not been reported, but I also don't believe it's an issue. Both twine and keyring have adopted the latest API and pass mypy tests.

they also cannot be used in any of the contexts they were appropriate for in <3.10 (Dict[str, List[EntryPoint]] or List[Entrypoint] depending on the api).

That's right. The API has changed.

many issues on importlib-metadata

Where "many" ~= 1 (https://github.com/python/importlib_metadata/issues?q=is%3Aissue+DeprecationWarning).

many issues linking to your deprecation issue

Do you mean python/importlib_metadata#289 or something else? I see ~4 projects (astropy, pytest-randomly, keyring, virtualenv) making mention there. I'd expected the number of projects to be affected to be more than that.

there is significant toil

I care about toil. A lot. I don't make incompatible changes lightly, and I spent a good deal of time documenting the motivations and providing guidance on how to transition. I've actively worked with each project that's requested help to minimize their toil and provide a one-shot transition to the new API.

if you look at your issue tracker it has been reported before and by others

I looked and didn't find it. Help me see what I'm missing.

"the testsuite didn't demonstrate this usecase so I'm free to change it"

That's not the spirit of my words. The API had an intended usage that was borne out by the documentation and tests. If users relied on other interfaces that were incidentally present, the user bears some risk in relying on those behaviors. Still, I accept responsibility to provide a transitional support even for those cases.

jaraco · 2021-05-29T09:51:22Z

The standard library has clear rules regarding how previously working interfaces should be deprecated, and this changeset is violating those. At no point was documented that relying on the list/dict trait of the existing interface is not part of the interface.

Thanks Gábor for chiming in.

A minor correction, the "dict" trait was documented, and compatibility is retained for that trait.

It's only the "list" trait of the less-commonly-used "Distribution.entry_points" that's a concern here, and I've yet to see an example of it being an actual concern. I've already offered to add compatibility if a compelling use case is presented.

I don't think the importlib libraries are special enough to warrant exclusion from this rule (as opposed let's say the zoneinfo).

The docs do explicitly call out that the module is provisional. https://docs.python.org/3/library/importlib.metadata.html

Still, I believe it's best for this module to honor the stdlib practices as best as possible, and I believe the indicated change does that.

Zac-HD · 2021-05-30T01:43:05Z

Just chiming in with a plea to slow down the rate of changes to importlib.metadata - I understand that you want to tidy up the API, but even deprecations cause substantial work downstream. Would it really be so bad to support the older APIs until they go end-of-life in CPython?

For example, in Hypothesis we care a great deal about compatibility with all supported Python versions (including treating warnings as errors) and try to minimize the hard dependencies. As a result, our entry-points handling looks like this:
https://github.com/HypothesisWorks/hypothesis/blob/0a90ed6edf56319149956c7321d4110078a5c228/hypothesis-python/src/hypothesis/entry_points.py

Change can be worth the cost, but there *is* a cost and the packaging ecosystem is already painfully complex and fragmented. Compatibility should be a foundational principle, not an optional extra _if someone presents a compelling use case!_

I'm also seriously concerned that you take GitHub links as an indication of who is affected. Python is very very widely used, including in environments that don't feed much back into the GitHub-open-source space, and I think it's important to avoid breaking things for low-visibility users too.

jaraco · 2021-05-30T16:11:08Z

Thanks Zac for your input.

Just chiming in with a plea to slow down the rate of changes to importlib.metadata - I understand that you want to tidy up the API, but even deprecations cause substantial work downstream.

It would be difficult to go much slower. Are you suggesting delaying the deprecation warning? My rationale for not delaying the deprecation warning is because it's possible using the backport to support the newer APIs all the way back to some end-of-life Pythons. If the deprecation warning is delayed, that seems to only delay the inevitable - that most projects will ultimately have to endure the toil of transitioning the code and relying on backports to support older Pythons.

Still, projects have the option to use the older APIs indefinitely by pinning the backport, or delay their adoption of the newer APIs by suppressing the warning. There's a lot of flexibility to limit the toil.

What approach would you recommend?

Would it really be so bad to support the older APIs until they go end-of-life in CPython?

At this point, I believe the code is compatible with all known use cases and it's conceivable the compatibility layer could be supported in Python releases until Python 3.9 is EOL. Is that what you're proposing? Would that help the Hypothesis case (or others)? My instinct is the value proposition there is small.

For example, in Hypothesis we care a great deal about compatibility with all supported Python versions (including treating warnings as errors) and try to minimize the hard dependencies. As a result, our entry-points handling looks like this...

Project maintainers are allowed of course to treat warnings like errors, but since deprecation warnings are the primary mechanism for an upstream API to signal to the downstream that something is changing, projects should expect an increased amount of toil by transforming the default behavior of warnings.

I suspect that the hypothesis project could achieve forward compatibility within its constraints by vendoring backports.entry_points_selectable, and thus re-using a shared implementation of the conditional import dance. I've added a note to the project's README indicating that option. The implementation you have seems suitable, though.

Change can be worth the cost, but there *is* a cost and the packaging ecosystem is already painfully complex and fragmented. Compatibility should be a foundational principle, not an optional extra _if someone presents a compelling use case!_

Agreed: compatibility is a foundational principle. Compatibility was built into the first release of this new behavior (importlib_metadata 3.6). Thanks to Anthony's feedback in the PR and extensive exploration and rewrites, the solution presented there has a fairly straightforward transition and clean separation of concerns. The case reported above, where compatibility was not achieved is an acknowledged missed concern, and I'm happy to invest the time to restore that compatibility if it's worth the trouble. The reason I'd thought it's not worth the trouble is because contrary to Anthony's claim, no user has reported an issue with index-based access on Distribution.entry_points results for the three months that this functionality has been in place, which is why a note about the incompatibility seemed sufficient (after the fact).

I'll proceed with adding compatibility for this reported case, although I worry that won't satisfy the concerns. Is there a satisfactory solution that doesn't involve reverting the changes? Is there an approach that meets the goals of the change with less disruption?

I'm also seriously concerned that you take GitHub links as an indication of who is affected. Python is very very widely used, including in environments that don't feed much back into the GitHub-open-source space, and I think it's important to avoid breaking things for low-visibility users too.

I surely recognize that Github links and reports are only one indicator of one segment of the user base, but it's the sole indicator these projects have to solicit user concerns. That's why I pinned the issue reported about the Deprecation warning and used that forum to express concern and support for the users' issues and to present a variety of approaches for any number of users to avail themselves. I wanted to increase the visibility of the issue and limit the difficulty of addressing the intentional deprecation.

I mainly rely on Github reports and +1s on those reports as an indication of the impact of an issue. I use Github links as a means of networking. It was Anthony who suggested the links were an indication of a widespread issue. I only meant to contrast that concern to other breakages (in my experience) that showed dozens of links to affected issues. Linked issues are a secondary indicator at best.

I do expect that if users have an issue that they would report it through python/importlib_metadata or bpo, but I also expect that absence of a report demonstrates stability. At least, that's been my experience in the hundreds of projects I've developed on Sourceforge, Bitbucket, GitLab, and Github.

After employing defensive, empathetic programming, developing docs, and writing comprehensive tests, what approaches should I be taking to solicit user concerns other than to have an open forum for soliciting issues and responding to those issues promptly and with proactive solutions?

asottile · 2021-05-30T16:17:31Z

I also need .sort(key=...) for what it's worth, the error in this issue was just the first encountered

I think my only satisfactory outcome would be:

the original api returns actual dicts
the sub-api returns actual lists
the new select is implemented as a separate *new* api without changing the existing api

asottile · 2021-05-30T16:22:42Z

also miurahr/aqtinstall#221

(this links to importlib-metadata tracker, not sure how you missed it)

asottile · 2021-05-30T17:04:11Z

the .select(...) api is at least twice as slow as indexing as well:

setup:

virtualenv venv39 -p python3.9
venv39/bin/pip install flake8 pytest pytest-randomly
virtualenv venv39 -p python3.10
venv310/bin/pip install flake8 pytest pytest-randomly

import importlib.metadata
import sys
import time


def f():
    eps = importlib.metadata.entry_points()
    if sys.version_info >= (3, 10):
        eps.select(name='console_scripts')
    else:
        eps['console_scripts']


t0 = time.time()
for _ in range(100):
    f()
t1 = time.time()
print(f'{t1-t0}')

$ ./venv39/bin/python t.py
0.687570333480835
$ ./venv310/bin/python t.py
1.3486714363098145

it is *way* worse when involving multiple entry points:

import importlib.metadata
import sys
import time


# moved outside of the loop, already showed this component is slower
eps = importlib.metadata.entry_points()
def f():
    # common for plugin systems to look up multiple entry points
    for ep in ('console_scripts', 'flake8.extension', 'pytest11'):
        if sys.version_info >= (3, 10):
            eps.select(name=ep)
        else:
            eps[ep]


t0 = time.time()
for _ in range(10000):
    f()
t1 = time.time()
print(f'{t1-t0}')

$ ./venv39/bin/python t.py
0.01629471778869629
$ ./venv310/bin/python t.py
8.569908380508423

asottile · 2021-05-30T17:12:51Z

oops, tiny typo in those code examples, they should say group= instead of name= -- though the performance is unchanged:

(first example)

$ ./venv39/bin/python t.py
0.6641988754272461
$ ./venv310/bin/python t.py
1.3172023296356201

(second example)

$ ./venv39/bin/python t.py
0.014233589172363281
$ ./venv310/bin/python t.py
8.910593271255493

jaraco · 2021-05-30T17:36:56Z

There are known performance concerns. I recommend to set those aside for now or move them to a separate issue because

(a) The performance is theoretically better in the nominal case because it avoids a sort/group operation.
(b) There are known performance degradations introduced by importlib_metadata 3.5 to de-duplicate distributions, degradations mitigated somewhat by importlib_metadata 4.3.
(c) Compatibility layers may be confounding performance concerns.

May I suggest addressing performance concerns in the importlib_metadata project as that project provides much better granularity on the different changes?

I think my only satisfactory outcome would be:

the original api returns actual dicts

The original API returns an actual dict subclass (SelectableGroups).

the sub-api returns actual lists

With python/importlib_metadata#323, this expectation is also met as EntryPoints is a list.

the new select is implemented as a separate *new* api without changing the existing api

The new API is invoked only through opt-in calls not previously available in the old API. I believe this achieves your goal without requiring a new name for entry_points or Distribution.entry_points (and thus creating less toil for consumers).

asottile · 2021-05-30T19:11:05Z

the toil is still present, the existing, good apis are deprecated and the new, bad apis are slow -- and the odd subclasses are still present

asottile · 2021-05-30T19:11:44Z

I have also shown that the performance is indeed not better in the nominal case, as demonstrated in the first case

jaraco · 2021-05-31T14:59:35Z

importlib_metadata 4.4 restores compatibility for the reported concerns. I'll merge those into CPython later.

asottile · 2021-05-31T15:03:55Z

it does not, it restores apis but in a way which requires a huge performance hit to avoid deprecation warnings

it also still has the 2-500x performance regression I've stated above

jaraco · 2021-05-31T15:52:37Z

New changeset 410b70d by Jason R. Coombs in branch 'main':
bpo-44246: Entry points performance improvements. (GH-26467)
410b70d

miss-islington · 2021-05-31T16:12:37Z

New changeset d1480ad by Miss Islington (bot) in branch '3.10':
bpo-44246: Entry points performance improvements. (GH-26467)
d1480ad

jaraco · 2021-05-31T16:19:50Z

New changeset c34ed08 by Jason R. Coombs in branch 'main':
bpo-44246: Restore compatibility in entry_points (GH-26468)
c34ed08

jaraco · 2021-05-31T16:29:35Z

New changeset 78d9a9b by Jason R. Coombs in branch 'main':
bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472)
78d9a9b

miss-islington · 2021-05-31T16:42:02Z

New changeset 7207203 by Miss Islington (bot) in branch '3.10':
[3.10] bpo-44246: Restore compatibility in entry_points (GH-26468) (GH-26471)
7207203

jaraco · 2021-05-31T22:29:27Z

New changeset d0991e2 by Miss Islington (bot) in branch '3.10':
bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472) (bpo-26473)
d0991e2

Fidget-Spinner · 2021-06-06T10:38:52Z

The new test test_entry_points_by_index (added in c34ed08 ) seems to fail on some windows buildbots:

https://dev.azure.com/Python/cpython/_build/results?buildId=81807&view=logs&j=c8a71634-e5ec-54a0-3958-760f4148b765&t=599737bc-ad72-560d-1530-0f89b05729e4

A copy of the error output for everyone's convenience:
======================================================================
ERROR: test_entry_points_by_index (test.test_importlib.test_metadata_api.APITests)
Prior versions of Distribution.entry_points would return a
----------------------------------------------------------------------

Traceback (most recent call last):
  File "D:\a\1\b\layout-appx-amd64\lib\test\test_importlib\test_metadata_api.py", line 145, in test_entry_points_by_index
    expected = next(iter(caught))
StopIteration

BTW, the same buildbot is currently failing on main with a different error which masks that error above. I'll do more digging if no one takes this up by next week. Currently I'm not able to reproduce that locally on my windows machine. Thanks all!

jaraco · 2021-06-06T15:04:57Z

The line where the failure occurs is the point where it's checking that the warning was issued. The fact that a StopIteration is raised indicates that no warnings were caught. I can think of a couple of scenarios where that could happen:

That warning is somehow disabled.
The Distribution object returned by distribution('distinfo-pkg') is somehow an older implementation (perhaps an older importlib_metadata is present).

Given that the DeprecationWarnings aren't missed on other tests, the latter seems to be a more likely candidate.

I notice that the regular tests are passing. It's only in the 'appx' environment where the test fails. I'm not familiar with appx, but it seems likely that something from the appx environment creation is a factor in the divergent behavior.

Steve, can you advise on how appx environments are created and how one could replicate a test failure that only occurs in that environment?

zooba · 2021-06-07T20:17:54Z

The appx layout is also the only one in CI that actually uses an installed layout - all the rest run tests from the source tree. So it could be related to that.

If it's a warning, it could also be that the warning is being triggered somewhere else first. Since tests run in a random order, you'll want to look for how reliable the failure is.

jaraco · 2021-06-08T00:58:58Z

Thanks Steve for the feedback.

If it's a warning, it could also be that the warning is being triggered somewhere else first. Since tests run in a random order, you'll want to look for how reliable the failure is.

It is a warning, but it seems unlikely that any other code is calling it, given that the supporting codepath was not present until the same PR.

The appx layout is also the only one in CI that actually uses an installed layout - all the rest run tests from the source tree. So it could be related to that.

If someone could help by producing a docker image that can build the appx layout and run the tests, that would help me as I don't have a lot of proficiency with installing build tools on Windows through the CLI... and the experience I do have has been fraught with challenges.

asottile · 2021-06-08T14:17:34Z

here's the performance regressions, they affect any callers of distributions() and are even worse on callers of the new apis.

a call to distributions() is about 3x slower than in 3.9

here is the setup I am using:

virtualenv venv39 -ppython3.9
venv39/bin/pip install flake8 pytest twine pre-commit
virtualenv venv310 -ppython3.10
venv310/bin/pip install flake8 pytest twine pre-commit

to test just the distributions() call I'm using the following:

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
20 loops, best of 20: 12.5 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
20 loops, best of 20: 36.7 msec per loop

this is a less-extreme example, many applications have more dependencies installed -- but even in this case this is adding ~24ms startup to any application using entry_points() -- and it gets worse

the return value of entry_points() alone isn't all that useful, next an application needs to retrieve its entry points. let's start for the somewhat normal case of retrieving a single category of entry points:

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()["flake8.extension"]'
20 loops, best of 20: 12.7 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points(name="flake8.extension")'
20 loops, best of 20: 37.1 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points().select(group="flake8.extension")'
20 loops, best of 20: 37.1 msec per loop

again, 3x slower and very real time to the end user (~24-25ms)

now let's show an example usage that something like flake8 uses where multiple groups are requested (this is common for apps and plugin systems which provide multiple distinct functionalities)

$ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps["flake8.extension"]; eps["flake8.report"]'
 
20 loops, best of 20: 12.6 msec per loop
$ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps.select(group="flake8.extension"); eps.select(group="flake8.report")'
20 loops, best of 20: 38.2 msec per loop

also slower, but an additional ms per call to .select(...)

and it only gets worse with more and more packages installed

here's the versions I'm using to ensure they are up to date:

$ venv39/bin/python --version --version
Python 3.9.5 (default, May 19 2021, 11:32:47) 
[GCC 9.3.0]
$ venv310/bin/python --version --version
Python 3.10.0b2 (default, Jun  2 2021, 00:22:18) [GCC 9.3.0]

jaraco · 2021-06-08T20:29:54Z

As mentioned in msg394775, I'd like to decouple the performance concerns from the original incompatibility. I recognize that performance regressions are in their own way a form of incompatibility, but there have been a lot of changes to entry points with respect to performance, both prior to beta 1 and in beta 2, including changes that intentionally traded performance for correctness (python/importlib_metadata#281).

To that end, I've filed python/importlib_metadata#324 to track the concerns.

asottile · 2021-06-08T20:33:35Z

they are directly coupled which is why I commented here

the api redesign forces O(N) lookups and O(N) constructions which directly impact performance causing the regression

jaraco · 2021-06-10T00:13:54Z

In this Dockerfile, I've attempted to install Visual Studio, but without success. Docker fails to build on line 6. The install fails with this error:

ERROR: Running ["C:\Users\ContainerAdministrator\AppData\Local\Temp\chocolatey\visualstudio2019buildtools\16.10.0.0\vs_BuildTools.exe" --quiet --add Microsoft.VisualStudio.Workload.ManagedDesktopBuildTools --add Microsoft.VisualStudio.Workload.NetCoreBuildTools --norestart --wait] was not successful. Exit code was '-2147024770'. See log for possible error messages.

Any tips on creating a Windows environment with build support would be appreciated.

jaraco · 2021-06-10T02:10:08Z

I managed to put together a Dockerfile that seemingly has the build tools installed (https://github.com/jaraco/jaraco.windows/blob/d2edad2e2af9d469189d7ac6a14a4ba6f6270348/Dockerfile).

When I attempt to build CPython, however, it fails with this error:

PS C:\code\public\cpython> PCBuild\build.bat
Using py -3.9 (found 3.9 with py.exe)
Fetching external libraries...
bzip2-1.0.6 already exists, skipping.
sqlite-3.35.5.0 already exists, skipping.
xz-5.2.2 already exists, skipping.
zlib-1.2.11 already exists, skipping.
Fetching external binaries...
libffi already exists, skipping.
openssl-bin-1.1.1k-1 already exists, skipping.
tcltk-8.6.11.0 already exists, skipping.
Finished.
Using "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin\MSBuild.exe"  (found in the PATH)
Using py -3.9 (found 3.9 with py.exe)

C:\code\public\cpython>"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin\MSBuild.exe"  "C:\code\public\cpython\PCbuild\pcbuild.proj" /t:Build /m /nologo /v:m /clp:summary /p:Configuration=Release /p:Platform=x64 /p:IncludeExternals=true /p:IncludeCTypes=true /p:IncludeSSL=true /p:IncludeTkinter=true /p:UseTestMarker= /p:GIT="C:\Program Files\Git\cmd\git.exe"
C:\code\public\cpython\PCbuild\python.props(111,31): error MSB4184: The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long. [C:\code\public\cpython\PCbuild\pythoncore.vcxproj]

Build FAILED.

C:\code\public\cpython\PCbuild\python.props(111,31): error MSB4184: The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long. [C:\code\public\cpython\PCbuild\pythoncore.vcxproj]
    0 Warning(s)
    1 Error(s)

Time Elapsed 00:00:00.15

jaraco · 2021-06-10T12:38:35Z

The build error is tracked in bpo-43298.

hroncok · 2021-06-18T12:23:09Z

The test_entry_points_by_index test also fails on Fedora. See bpo-44451.

hectorizdaone · 2021-06-18T19:19:57Z

Addes aggreement

jaraco · 2021-06-27T21:42:27Z

In python/importlib_metadata#324, I painstakingly created a library to perform robust performance monitoring of behaviors in the library and specifically added tests that capture the performance of entry_points against importlib_metadata 1.4 (the same code used in Python 3.9). I furthermore attempted to replicate the reported performance concerns reported using the exact commands, and even then, Python 3.10b3 indicates a 3x+ improvement in performance, dispelling the claim that the issue is order of complexity of the resolution of entry points by group.

I'll be open to new reports about performance concerns or other regressions w.r.t. the importlib.metadata changes, but please do not re-open this issue.

asottile mannequin added 3.10 only security fixes labels May 27, 2021

jaraco closed this as completed May 31, 2021

asottile mannequin reopened this Jun 8, 2021

jaraco closed this as completed Jun 8, 2021

asottile mannequin reopened this Jun 8, 2021

hectorizdaone mannequin added build The build process and cross-build type-security A security issue labels Jun 18, 2021

jaraco closed this as completed Jun 27, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

3.10 beta 1: breaking change in importlib.metadata entry points #88412

3.10 beta 1: breaking change in importlib.metadata entry points #88412

Comments

asottile mannequin commented May 27, 2021

asottile mannequin commented May 27, 2021

tirkarthi commented May 27, 2021

jaraco commented May 27, 2021

asottile mannequin commented May 27, 2021

jaraco commented May 27, 2021

asottile mannequin commented May 27, 2021

jaraco commented May 27, 2021

jaraco commented May 27, 2021

jaraco commented May 27, 2021

miss-islington commented May 27, 2021

asottile mannequin commented May 28, 2021

gaborbernat mannequin commented May 29, 2021

jaraco commented May 29, 2021

jaraco commented May 29, 2021

Zac-HD mannequin commented May 30, 2021

jaraco commented May 30, 2021

asottile mannequin commented May 30, 2021

asottile mannequin commented May 30, 2021

asottile mannequin commented May 30, 2021

asottile mannequin commented May 30, 2021

jaraco commented May 30, 2021

asottile mannequin commented May 30, 2021

asottile mannequin commented May 30, 2021

jaraco commented May 31, 2021

asottile mannequin commented May 31, 2021

jaraco commented May 31, 2021

miss-islington commented May 31, 2021

jaraco commented May 31, 2021

jaraco commented May 31, 2021

miss-islington commented May 31, 2021

jaraco commented May 31, 2021

Fidget-Spinner commented Jun 6, 2021

jaraco commented Jun 6, 2021

zooba commented Jun 7, 2021

jaraco commented Jun 8, 2021

asottile mannequin commented Jun 8, 2021

jaraco commented Jun 8, 2021

asottile mannequin commented Jun 8, 2021

jaraco commented Jun 10, 2021

jaraco commented Jun 10, 2021

jaraco commented Jun 10, 2021

hroncok mannequin commented Jun 18, 2021

hectorizdaone mannequin commented Jun 18, 2021

jaraco commented Jun 27, 2021