Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.10 beta 1: breaking change in importlib.metadata entry points #88412

Closed
asottile mannequin opened this issue May 27, 2021 · 45 comments
Closed

3.10 beta 1: breaking change in importlib.metadata entry points #88412

asottile mannequin opened this issue May 27, 2021 · 45 comments
Labels
3.10 only security fixes 3.11 only security fixes build The build process and cross-build type-security A security issue

Comments

@asottile
Copy link
Mannequin

asottile mannequin commented May 27, 2021

BPO 44246
Nosy @warsaw, @jaraco, @zooba, @asottile, @hroncok, @Zac-HD, @miss-islington, @domdfcoding, @Fidget-Spinner
PRs
  • bpo-44246: Update What's New for importlib.metadata. #26408
  • [3.10] bpo-44246: Update What's New for importlib.metadata. (GH-26408) #26415
  • bpo-44246: Entry points performance improvements. #26467
  • bpo-44246: Restore compatibility in entry_points #26468
  • [3.10] bpo-44246: Entry points performance improvements. (GH-26467) #26469
  • [3.10] bpo-44246: Restore compatibility in entry_points (GH-26468) #26471
  • bpo-44246: Remove note about access by index now that a compatibility shim is offered. #26472
  • [3.10] bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472) #26473
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-27.21:42:26.609>
    created_at = <Date 2021-05-27.15:30:11.114>
    labels = ['type-security', 'build', '3.10', '3.11']
    title = '3.10 beta 1: breaking change in importlib.metadata entry points'
    updated_at = <Date 2021-06-28.18:05:20.022>
    user = 'https://github.com/asottile'

    bugs.python.org fields:

    activity = <Date 2021-06-28.18:05:20.022>
    actor = 'barry'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-27.21:42:26.609>
    closer = 'jaraco'
    components = ['Build']
    creation = <Date 2021-05-27.15:30:11.114>
    creator = 'Anthony Sottile'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44246
    keywords = ['patch']
    message_count = 45.0
    messages = ['394548', '394550', '394552', '394553', '394557', '394559', '394560', '394588', '394616', '394620', '394625', '394726', '394727', '394728', '394749', '394767', '394769', '394770', '394773', '394774', '394775', '394777', '394778', '394806', '394807', '394811', '394812', '394813', '394814', '394815', '394816', '394830', '395202', '395207', '395282', '395301', '395333', '395360', '395361', '395495', '395500', '395531', '396056', '396081', '396605']
    nosy_count = 10.0
    nosy_names = ['barry', 'jaraco', 'steve.dower', 'Anthony Sottile', 'hroncok', 'Zac Hatfield-Dodds', 'miss-islington', 'domdfcoding', 'kj', 'hectorizdaone']
    pr_nums = ['26408', '26415', '26467', '26468', '26469', '26471', '26472', '26473']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue44246'
    versions = ['Python 3.10', 'Python 3.11']

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 27, 2021

    this is breaking code that's unfortunately out of my control (vendor) -- also it looks really wrong

    import importlib.metadata
    print('looks like a list:')
    print(importlib.metadata.distribution('pip').entry_points)
    print('first item:')
    print(importlib.metadata.distribution('pip').entry_points[0])

    output in 3.9:

    $ ./venv39/bin/python t.py
    looks like a list:
    [EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3.8', value='pip._internal.cli.main:main', group='console_scripts')]
    first item:
    EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
    $ venv310/bin/python t.py
    looks like a list:
    (EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts'), EntryPoint(name='pip3.8', value='pip._internal.cli.main:main', group='console_scripts'))
    first item:
    Traceback (most recent call last):
      File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 217, in __getitem__
        return next(iter(self.select(name=name)))
    StopIteration
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/y/t.py", line 5, in <module>
        print(importlib.metadata.distribution('pip').entry_points[0])
      File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 219, in __getitem__
        raise KeyError(name)
    KeyError: 0

    @asottile asottile mannequin added 3.10 only security fixes labels May 27, 2021
    @tirkarthi
    Copy link
    Member

    Seems this was reported in python/importlib_metadata#300 . Closed in python/importlib_metadata@5ca9bc7.

    @jaraco
    Copy link
    Member

    jaraco commented May 27, 2021

    This backward incompatibility was unintentionally introduced in importlib_metadata 3.6 (https://importlib-metadata.readthedocs.io/en/latest/history.html#v3-6-0, released Feb 23) and was previously reported in python/importlib_metadata#300. While technically it's a breaking change, here's my analysis from that issue:

    Basically what it boils down to is that access by index of a Distribution.entry_points was dropped, but that appears not to be a problem. At least, this is the first report of such a problem. The assumption has been, and the tests bear this out implicitly, that the consumer of Distribution.entry_points will be iterated over and not accessed by index.

    My assessment is that the issue is theoretically incompatible but in practice, this usage is not found in the wild. If there was a substantial need to maintain compatibility for this use-case, I'd definitely consider adding such compatibility. As it stands, however, I'm unaware of even a single use case that demands this interface.

    Given the non-issue this has been for importlib_metadata, I expect it to be a non-issue for Python 3.10 as well.

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 27, 2021

    the "what's new" mentions nothing of this break, nor the aggressive deprecation warnings, nor the various shifted interfaces (such as this one): https://docs.python.org/3.10/whatsnew/3.10.html#importlib-metadata

    @jaraco
    Copy link
    Member

    jaraco commented May 27, 2021

    Yes, perhaps the What's New could be refreshed.

    There is a Compatibility Note in the docs for entry_points about the deprecated usage (https://docs.python.org/3.10/library/importlib.metadata.html?highlight=importlib%20metadata#entry-points).

    I want the What's New to be only those aspects that are particularly salient to the users, so I probably would not even mention the compatibility change reported above unless I expected it to affect users, but I wouldn't be opposed to more thorough messaging if you believe that would address the concern.

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 27, 2021

    personally I think they should be reverted -- they were not committed in spirit with the backwards compatibility policy: https://www.python.org/dev/peps/pep-0387/#making-incompatible-changes

    • I don't think they were discussed thoroughly, and when opposition was presented it was not listened to thoroughly: Feature/entry points by group and name importlib_metadata#278
    • the change significantly complicates importlib.metadata with lots of sneaky types (they all look like builtin types but do not act like them)
    • it simultaneously introduces new apis and old apis which will both be around for extended periods of time but immediately moves to DeprecationWarning
    • the new apis aren't remarkably better than the old apis -- the motivation was written as "typing issues" but without citing actual issues. in fact, the new items are significantly more complicated to type properly
    • the change breaks many significantly important projects, from perusing related issues it's at the very least flake8, pandas, virtualenv, astropy, pytest, hypothesis -- and hundreds more from a quick github code search

    @jaraco
    Copy link
    Member

    jaraco commented May 27, 2021

    In the What's New, I used :func:`importlib.metadata.entry_points`, but that doesn't seem to resolve to the docs for the function. I need to figure out how to link to the entry_points anchor that's there.

    @jaraco
    Copy link
    Member

    jaraco commented May 27, 2021

    Although it feels the topic has shifted from the original concern (an unintentional incompatibility) to the broader topic of the API change to entry_points generally, I'm happy to address your comments:

    Anyone reading that thread will see that I was responsive to your concerns, adapted the solution based on your concerns, and spent extra energy documenting the motivations for the change and exploring solutions until I came up with something that I believed would address the concerns and has since largely borne out that goal in the release.

    • the change significantly complicates importlib.metadata with lots of sneaky types (they all look like builtin types but do not act like them)

    The solution that I ultimately settled on in python/importlib_metadata#278 does not use "sneaky" or "cutsey" types, but in fact reduces the number of magic types. EntryPoints is a tuple subclass, EntryPoint is a namedtuple subclass (with __iter__ magic deprecated), and SelectableGroups is a compatibility shim to be removed. Which types are sneaky and look like built-in types but do not act like them?

    • it simultaneously introduces new apis and old apis which will both be around for extended periods of time but immediately moves to DeprecationWarning

    Given that backports are available, I saw no strong reason to delay the DeprecationWarning. I was open to the possibility that the transition could prove too taxing and the deprecation would have to be delayed. This early exposure means that most projects will already have addressed the deprecation concerns prior to the release of Python 3.10. In python/importlib_metadata#298, the SQLAlchemy team helped me understand a nuanced use-case I hadn't considered, where libraries don't necessarily have the luxury of requiring newer backport releases, but in that case, I developed a solution that would provide future compatibility even with older stdlib and backport releases (through backports.entry_points_selectable). As far as I understand, there are no known use-cases that aren't satisfied by this design.

    • the new apis aren't remarkably better than the old apis -- the motivation was written as "typing issues" but without citing actual issues. in fact, the new items are significantly more complicated to type properly

    The typing issues were discussed here (pypa/twine#728 (review)) in the PR referenced by python/importlib_metadata#278. The magic casting of a two tuple to an item of a dict was found to be incompatible and unsupported by mypy (python/mypy#9938). I further expanded on the motivations that led to this approach in python/importlib_metadata#282 and python/importlib_metadata#284.

    So while Twine is able to declare types properly using this new design, it was infeasible to do so in the old design. After the compatibility layers are removed, entry_points will simply return an EntryPoints object, which presents an iterable of EntryPoint objects but with some facilities for selection. It's straightforward and clean. Please demonstrate the complication you see with the current approach.

    The new APIs are not only easier to describe with types, but they are easier to describe in documentation.

    • the change breaks many significantly important projects, from perusing related issues it's at the very least flake8, pandas, virtualenv, astropy, pytest, hypothesis -- and hundreds more from a quick github code search

    I'm not aware of a single breakage. The code is compatible for all known use-cases, but does present a DeprecationWarning in some usages. In the case of flake8, I've proposed a solution to avoid the DeprecationWarning and move to the preferred design, even without requiring an updated importlib_metadata backport.

    I'm invested in providing as seamless a transition as possible to all projects, and I believe this change accomplishes that. Please let me know if there's a project or application where that's not the case.

    personally I think [all API changes] should be reverted

    Given the amount of adoption already, reverting these changes is likely to cause more disruption than moving forward with them.

    If you would like to see the changes reverted or adapted further, please provide an example of a use-case that's broken by the current approach. So far, the only example presented above appears contrived and not an actual concern. It's common for Python minor releases to introduce undocumented changes that are technically incompatible but in ways that affect few or no users.

    @jaraco
    Copy link
    Member

    jaraco commented May 27, 2021

    New changeset 28f12c9 by Jason R. Coombs in branch 'main':
    bpo-44246: Update What's New for importlib.metadata. (bpo-26408)
    28f12c9

    @miss-islington
    Copy link
    Contributor

    New changeset 59f9594 by Miss Islington (bot) in branch '3.10':
    [3.10] bpo-44246: Update What's New for importlib.metadata. (GH-26408) (GH-26415)
    59f9594

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 28, 2021

    Which types are sneaky and look like built-in types but do not act like them?

    well for starters, there's the tuple subclass which pretends to be a dict. but it violates substitutability for both tuple and Mapping so it's not useful in either contexts. mypy complains about incorrect types in overrides for both. the worst part of this is that the __getitem__ moves from O(1) to O(N) (in some private code this makes importlib.metadata on 3.10 10x slower than on 3.9). next there's the EntryPoints tuple subclass which looks like a tuple but doesn't at all act like one (getitem fails substitutability for example) -- this is an api break with 3.9 which returned a list (can't .sort() .extend(...), etc. any more)

    Given that backports are available

    I don't think this is appropriate. re-introducing a backport brings in a tree of dependencies that have been shaky at best with backward compatibility. in other words, using standard library importlib.metadata provides significantly improved compatibility and stability over the moving target backport (and its tree of dependencies, zipp being the one that breaks the most often from experience). you'll notice I closed the flake8 PRs specifically because I didn't want to reintroduce the backport. This backport also globally monkeypatches the import machinery breaking any other consumer as well -- in a tool as popular as flake8 I can't really make that global mutation decision for all of the other consumers.

    The typing issues

    these were trivially solved by a dictionary comprehension using entrypoint.name -- it really did not need a full rework and break of the api to solve (could have just deprecated the __iter__ which I actually suggested on the original implementation way back in 3.8)

    The new APIs are not only easier to describe with types

    the types describing the new apis require significant # type: ignores to pass mypy because they violate basic substitutability. they also cannot be used in any of the contexts they were appropriate for in <3.10 (Dict[str, List[EntryPoint]] or List[Entrypoint] depending on the api).

    I'm not aware of a single breakage.

    I'm sorry but you have to have realized from the many issues on importlib-metadata or the many issues linking to your deprecation issue that there is significant toil caused directly by your change. cpython is meant to be a stable substrate to build upon, please do not force the community to shoulder the burden of your poor api decisions.

    even without requiring an updated importlib_metadata backport.

    your proposed change introduced a different, unrelated package. not without its own maintenance problems (an additional dependency that has to ~work indefinitely, a hack at best to support this breaking change)

    Given the amount of adoption already, reverting these changes is likely to cause more disruption than moving forward with them.

    I disagree, bigger things have been reverted (see: __future__.annotations)

    So far, the only example presented above appears contrived and not an actual concern

    I promise you this is not a contrived case, if you look at your issue tracker it has been reported before and by others. For every issue reported there's likely tens or hundreds of others which are not reported.

    It's common for Python minor releases to introduce undocumented changes that are technically incompatible

    I've seen this as a rationalization for intentional surprise breaking changes but I don't buy it.

    Additionally your comments about (paraphrased) "the testsuite didn't demonstrate this usecase so I'm free to change it" are frankly a cop out -- your api returned a dict and a list, that's part of the public api -- changing that is a breaking change.

    @gaborbernat
    Copy link
    Mannequin

    gaborbernat mannequin commented May 29, 2021

    After reading through the points here, I must say I agree with Anthony here. The standard library has clear rules regarding how previously working interfaces should be deprecated, and this changeset is violating those. At no point was documented that relying on the list/dict trait of the existing interface is not part of the interface. I don't think the importlib libraries are special enough to warrant exclusion from this rule (as opposed let's say the zoneinfo).

    @jaraco
    Copy link
    Member

    jaraco commented May 29, 2021

    there's the tuple subclass which pretends to be a dict.

    There's no tuple subclass that's pretending to be a dict. It overrides __getitem__ for convenience. It never claims to support Mapping.

    mypy complains about incorrect types in overrides for both.

    I'm unsure what the concern is. If there's an issue, it hasn't been reported to the project. importlib_metadata runs mypy tests with the test suite (all passing) and twine uses the API with strict mypy checking enabled without any exclusions.

    the worst part of this is that the __getitem__ moves from O(1) to O(N) (in some private code this makes importlib.metadata on 3.10 10x slower than on 3.9)

    This issue was revealed during the review and I acknowledged the concern and agreed to address the issue if it mattered. This project has demonstrated its concern for performance issues as are apparent through a number of optimizations in the changelog. In every use case I've seen, the performance is improved by the current approach (a group/sort operation is avoided). If the performance is a concern, I once again welcome a bug report describing the use-case and the impact, though I suspect it's an isolated case and likely would best be addressed outside the official codebase.

    this is an api break with 3.9 which returned a list

    I acknowledge this break, though I believe the concerns are overblown. The API specifically sought to reduce dependence on receiving a list and instead to provide a more abstract collection.

    I don't think [introducing behavior in backports] is appropriate.

    It's true, the "backport" monkier is a false one here. From the very beginning, these modules first introduced their behavior outside the stdlib and were then ported into CPython. Moreover, the past couple of years have seen substantial refinement and innovation and was able to move much faster and reach stability much faster and with wider adoption than if the library had followed the stardard Python development cadence.

    It's quite likely that this project will eventually stabilize to the point that most users do not need the backport, but while it exists, it's providing massive value. Consider the most recent example (https://importlib-metadata.readthedocs.io/en/latest/history.html#v4-3-1) where a performance improvement caused a regression. The regression was detected and fixed within a day. Now when CPython adopts that behavior, we can all have higher confidence in the viability of the implementation.

    It would be a pretty big shift to block this approach, but it's not out of the realm of consideration. Still, it's out of scope for this discussion. Feel free to raise it separately.

    [the typing issues] were *trivially solved* by a dictionary comprehension

    No such solution was proposed by anybody, but more importantly, I don't believe the solution would have been so trivial and still met the objectives.

    the types describing the new apis require significant # type: ignores to pass mypy because they violate basic substitutability.

    I'm unaware of this issue and it's not been reported, but I also don't believe it's an issue. Both twine and keyring have adopted the latest API and pass mypy tests.

    they also cannot be used in any of the contexts they were appropriate for in <3.10 (Dict[str, List[EntryPoint]] or List[Entrypoint] depending on the api).

    That's right. The API has changed.

    many issues on importlib-metadata

    Where "many" ~= 1 (https://github.com/python/importlib_metadata/issues?q=is%3Aissue+DeprecationWarning).

    many issues linking to your deprecation issue

    Do you mean python/importlib_metadata#289 or something else? I see ~4 projects (astropy, pytest-randomly, keyring, virtualenv) making mention there. I'd expected the number of projects to be affected to be more than that.

    there is significant toil

    I care about toil. A lot. I don't make incompatible changes lightly, and I spent a good deal of time documenting the motivations and providing guidance on how to transition. I've actively worked with each project that's requested help to minimize their toil and provide a one-shot transition to the new API.

    if you look at your issue tracker it has been reported before and by others

    I looked and didn't find it. Help me see what I'm missing.

    "the testsuite didn't demonstrate this usecase so I'm free to change it"

    That's not the spirit of my words. The API had an intended usage that was borne out by the documentation and tests. If users relied on other interfaces that were incidentally present, the user bears some risk in relying on those behaviors. Still, I accept responsibility to provide a transitional support even for those cases.

    @jaraco
    Copy link
    Member

    jaraco commented May 29, 2021

    The standard library has clear rules regarding how previously working interfaces should be deprecated, and this changeset is violating those. At no point was documented that relying on the list/dict trait of the existing interface is not part of the interface.

    Thanks Gábor for chiming in.

    A minor correction, the "dict" trait was documented, and compatibility is retained for that trait.

    It's only the "list" trait of the less-commonly-used "Distribution.entry_points" that's a concern here, and I've yet to see an example of it being an actual concern. I've already offered to add compatibility if a compelling use case is presented.

    I don't think the importlib libraries are special enough to warrant exclusion from this rule (as opposed let's say the zoneinfo).

    The docs do explicitly call out that the module is provisional. https://docs.python.org/3/library/importlib.metadata.html

    Still, I believe it's best for this module to honor the stdlib practices as best as possible, and I believe the indicated change does that.

    @Zac-HD
    Copy link
    Mannequin

    Zac-HD mannequin commented May 30, 2021

    Just chiming in with a plea to slow down the rate of changes to importlib.metadata - I understand that you want to tidy up the API, but even deprecations cause substantial work downstream. Would it really be so bad to support the older APIs until they go end-of-life in CPython?

    For example, in Hypothesis we care a great deal about compatibility with all supported Python versions (including treating warnings as errors) and try to minimize the hard dependencies. As a result, our entry-points handling looks like this:
    https://github.com/HypothesisWorks/hypothesis/blob/0a90ed6edf56319149956c7321d4110078a5c228/hypothesis-python/src/hypothesis/entry_points.py

    Change can be worth the cost, but there *is* a cost and the packaging ecosystem is already painfully complex and fragmented. Compatibility should be a foundational principle, not an optional extra _if someone presents a compelling use case!_

    I'm also seriously concerned that you take GitHub links as an indication of who is affected. Python is very very widely used, including in environments that don't feed much back into the GitHub-open-source space, and I think it's important to avoid breaking things for low-visibility users too.

    @jaraco
    Copy link
    Member

    jaraco commented May 30, 2021

    Thanks Zac for your input.

    Just chiming in with a plea to slow down the rate of changes to importlib.metadata - I understand that you want to tidy up the API, but even deprecations cause substantial work downstream.

    It would be difficult to go much slower. Are you suggesting delaying the deprecation warning? My rationale for not delaying the deprecation warning is because it's possible using the backport to support the newer APIs all the way back to some end-of-life Pythons. If the deprecation warning is delayed, that seems to only delay the inevitable - that most projects will ultimately have to endure the toil of transitioning the code and relying on backports to support older Pythons.

    Still, projects have the option to use the older APIs indefinitely by pinning the backport, or delay their adoption of the newer APIs by suppressing the warning. There's a lot of flexibility to limit the toil.

    What approach would you recommend?

    Would it really be so bad to support the older APIs until they go end-of-life in CPython?

    At this point, I believe the code is compatible with all known use cases and it's conceivable the compatibility layer could be supported in Python releases until Python 3.9 is EOL. Is that what you're proposing? Would that help the Hypothesis case (or others)? My instinct is the value proposition there is small.

    For example, in Hypothesis we care a great deal about compatibility with all supported Python versions (including treating warnings as errors) and try to minimize the hard dependencies. As a result, our entry-points handling looks like this...

    Project maintainers are allowed of course to treat warnings like errors, but since deprecation warnings are the primary mechanism for an upstream API to signal to the downstream that something is changing, projects should expect an increased amount of toil by transforming the default behavior of warnings.

    I suspect that the hypothesis project could achieve forward compatibility within its constraints by vendoring backports.entry_points_selectable, and thus re-using a shared implementation of the conditional import dance. I've added a note to the project's README indicating that option. The implementation you have seems suitable, though.

    Change can be worth the cost, but there *is* a cost and the packaging ecosystem is already painfully complex and fragmented. Compatibility should be a foundational principle, not an optional extra _if someone presents a compelling use case!_

    Agreed: compatibility is a foundational principle. Compatibility was built into the first release of this new behavior (importlib_metadata 3.6). Thanks to Anthony's feedback in the PR and extensive exploration and rewrites, the solution presented there has a fairly straightforward transition and clean separation of concerns. The case reported above, where compatibility was not achieved is an acknowledged missed concern, and I'm happy to invest the time to restore that compatibility if it's worth the trouble. The reason I'd thought it's not worth the trouble is because contrary to Anthony's claim, no user has reported an issue with index-based access on Distribution.entry_points results for the three months that this functionality has been in place, which is why a note about the incompatibility seemed sufficient (after the fact).

    I'll proceed with adding compatibility for this reported case, although I worry that won't satisfy the concerns. Is there a satisfactory solution that doesn't involve reverting the changes? Is there an approach that meets the goals of the change with less disruption?

    I'm also seriously concerned that you take GitHub links as an indication of who is affected. Python is very very widely used, including in environments that don't feed much back into the GitHub-open-source space, and I think it's important to avoid breaking things for low-visibility users too.

    I surely recognize that Github links and reports are only one indicator of one segment of the user base, but it's the sole indicator these projects have to solicit user concerns. That's why I pinned the issue reported about the Deprecation warning and used that forum to express concern and support for the users' issues and to present a variety of approaches for any number of users to avail themselves. I wanted to increase the visibility of the issue and limit the difficulty of addressing the intentional deprecation.

    I mainly rely on Github reports and +1s on those reports as an indication of the impact of an issue. I use Github links as a means of networking. It was Anthony who suggested the links were an indication of a widespread issue. I only meant to contrast that concern to other breakages (in my experience) that showed dozens of links to affected issues. Linked issues are a secondary indicator at best.

    I do expect that if users have an issue that they would report it through python/importlib_metadata or bpo, but I also expect that absence of a report demonstrates stability. At least, that's been my experience in the hundreds of projects I've developed on Sourceforge, Bitbucket, GitLab, and Github.

    After employing defensive, empathetic programming, developing docs, and writing comprehensive tests, what approaches should I be taking to solicit user concerns other than to have an open forum for soliciting issues and responding to those issues promptly and with proactive solutions?

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    I also need .sort(key=...) for what it's worth, the error in this issue was just the first encountered

    I think my only satisfactory outcome would be:

    • the original api returns actual dicts
    • the sub-api returns actual lists
    • the new select is implemented as a separate *new* api without changing the existing api

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    also miurahr/aqtinstall#221

    (this links to importlib-metadata tracker, not sure how you missed it)

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    the .select(...) api is at least twice as slow as indexing as well:

    setup:

    virtualenv venv39 -p python3.9
    venv39/bin/pip install flake8 pytest pytest-randomly
    virtualenv venv39 -p python3.10
    venv310/bin/pip install flake8 pytest pytest-randomly
    
    import importlib.metadata
    import sys
    import time
    
    
    def f():
        eps = importlib.metadata.entry_points()
        if sys.version_info >= (3, 10):
            eps.select(name='console_scripts')
        else:
            eps['console_scripts']
    
    
    t0 = time.time()
    for _ in range(100):
        f()
    t1 = time.time()
    print(f'{t1-t0}')
    $ ./venv39/bin/python t.py
    0.687570333480835
    $ ./venv310/bin/python t.py
    1.3486714363098145
    

    it is *way* worse when involving multiple entry points:

    import importlib.metadata
    import sys
    import time
    
    
    # moved outside of the loop, already showed this component is slower
    eps = importlib.metadata.entry_points()
    def f():
        # common for plugin systems to look up multiple entry points
        for ep in ('console_scripts', 'flake8.extension', 'pytest11'):
            if sys.version_info >= (3, 10):
                eps.select(name=ep)
            else:
                eps[ep]
    
    
    t0 = time.time()
    for _ in range(10000):
        f()
    t1 = time.time()
    print(f'{t1-t0}')
    $ ./venv39/bin/python t.py
    0.01629471778869629
    $ ./venv310/bin/python t.py
    8.569908380508423

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    oops, tiny typo in those code examples, they should say group= instead of name= -- though the performance is unchanged:

    (first example)

    $ ./venv39/bin/python t.py
    0.6641988754272461
    $ ./venv310/bin/python t.py
    1.3172023296356201

    (second example)

    $ ./venv39/bin/python t.py
    0.014233589172363281
    $ ./venv310/bin/python t.py
    8.910593271255493

    @jaraco
    Copy link
    Member

    jaraco commented May 30, 2021

    There are known performance concerns. I recommend to set those aside for now or move them to a separate issue because

    (a) The performance is theoretically better in the nominal case because it avoids a sort/group operation.
    (b) There are known performance degradations introduced by importlib_metadata 3.5 to de-duplicate distributions, degradations mitigated somewhat by importlib_metadata 4.3.
    (c) Compatibility layers may be confounding performance concerns.

    May I suggest addressing performance concerns in the importlib_metadata project as that project provides much better granularity on the different changes?

    I think my only satisfactory outcome would be:

    • the original api returns actual dicts

    The original API returns an actual dict subclass (SelectableGroups).

    • the sub-api returns actual lists

    With python/importlib_metadata#323, this expectation is also met as EntryPoints is a list.

    • the new select is implemented as a separate *new* api without changing the existing api

    The new API is invoked only through opt-in calls not previously available in the old API. I believe this achieves your goal without requiring a new name for entry_points or Distribution.entry_points (and thus creating less toil for consumers).

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    the toil is still present, the existing, good apis are deprecated and the new, bad apis are slow -- and the odd subclasses are still present

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 30, 2021

    I have also shown that the performance is indeed not better in the nominal case, as demonstrated in the first case

    @jaraco
    Copy link
    Member

    jaraco commented May 31, 2021

    importlib_metadata 4.4 restores compatibility for the reported concerns. I'll merge those into CPython later.

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented May 31, 2021

    it does not, it restores apis but in a way which requires a huge performance hit to avoid deprecation warnings

    it also still has the 2-500x performance regression I've stated above

    @jaraco
    Copy link
    Member

    jaraco commented May 31, 2021

    New changeset 410b70d by Jason R. Coombs in branch 'main':
    bpo-44246: Entry points performance improvements. (GH-26467)
    410b70d

    @miss-islington
    Copy link
    Contributor

    New changeset d1480ad by Miss Islington (bot) in branch '3.10':
    bpo-44246: Entry points performance improvements. (GH-26467)
    d1480ad

    @jaraco
    Copy link
    Member

    jaraco commented May 31, 2021

    New changeset c34ed08 by Jason R. Coombs in branch 'main':
    bpo-44246: Restore compatibility in entry_points (GH-26468)
    c34ed08

    @jaraco jaraco closed this as completed May 31, 2021
    @jaraco
    Copy link
    Member

    jaraco commented May 31, 2021

    New changeset 78d9a9b by Jason R. Coombs in branch 'main':
    bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472)
    78d9a9b

    @miss-islington
    Copy link
    Contributor

    New changeset 7207203 by Miss Islington (bot) in branch '3.10':
    [3.10] bpo-44246: Restore compatibility in entry_points (GH-26468) (GH-26471)
    7207203

    @jaraco
    Copy link
    Member

    jaraco commented May 31, 2021

    New changeset d0991e2 by Miss Islington (bot) in branch '3.10':
    bpo-44246: Remove note about access by index now that a compatibility shim is offered. (GH-26472) (bpo-26473)
    d0991e2

    @Fidget-Spinner
    Copy link
    Member

    The new test test_entry_points_by_index (added in c34ed08 ) seems to fail on some windows buildbots:

    https://dev.azure.com/Python/cpython/_build/results?buildId=81807&view=logs&j=c8a71634-e5ec-54a0-3958-760f4148b765&t=599737bc-ad72-560d-1530-0f89b05729e4

    A copy of the error output for everyone's convenience:
    ======================================================================
    ERROR: test_entry_points_by_index (test.test_importlib.test_metadata_api.APITests)
    Prior versions of Distribution.entry_points would return a
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\a\1\b\layout-appx-amd64\lib\test\test_importlib\test_metadata_api.py", line 145, in test_entry_points_by_index
        expected = next(iter(caught))
    StopIteration

    BTW, the same buildbot is currently failing on main with a different error which masks that error above. I'll do more digging if no one takes this up by next week. Currently I'm not able to reproduce that locally on my windows machine. Thanks all!

    @jaraco
    Copy link
    Member

    jaraco commented Jun 6, 2021

    The line where the failure occurs is the point where it's checking that the warning was issued. The fact that a StopIteration is raised indicates that no warnings were caught. I can think of a couple of scenarios where that could happen:

    • That warning is somehow disabled.
    • The Distribution object returned by distribution('distinfo-pkg') is somehow an older implementation (perhaps an older importlib_metadata is present).

    Given that the DeprecationWarnings aren't missed on other tests, the latter seems to be a more likely candidate.

    I notice that the regular tests are passing. It's only in the 'appx' environment where the test fails. I'm not familiar with appx, but it seems likely that something from the appx environment creation is a factor in the divergent behavior.

    Steve, can you advise on how appx environments are created and how one could replicate a test failure that only occurs in that environment?

    @zooba
    Copy link
    Member

    zooba commented Jun 7, 2021

    The appx layout is also the only one in CI that actually uses an installed layout - all the rest run tests from the source tree. So it could be related to that.

    If it's a warning, it could also be that the warning is being triggered somewhere else first. Since tests run in a random order, you'll want to look for how reliable the failure is.

    @jaraco
    Copy link
    Member

    jaraco commented Jun 8, 2021

    Thanks Steve for the feedback.

    If it's a warning, it could also be that the warning is being triggered somewhere else first. Since tests run in a random order, you'll want to look for how reliable the failure is.

    It is a warning, but it seems unlikely that any other code is calling it, given that the supporting codepath was not present until the same PR.

    The appx layout is also the only one in CI that actually uses an installed layout - all the rest run tests from the source tree. So it could be related to that.

    If someone could help by producing a docker image that can build the appx layout and run the tests, that would help me as I don't have a lot of proficiency with installing build tools on Windows through the CLI... and the experience I do have has been fraught with challenges.

    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented Jun 8, 2021

    here's the performance regressions, they affect any callers of distributions() and are even worse on callers of the new apis.

    a call to distributions() is about 3x slower than in 3.9

    here is the setup I am using:

    virtualenv venv39 -ppython3.9
    venv39/bin/pip install flake8 pytest twine pre-commit
    virtualenv venv310 -ppython3.10
    venv310/bin/pip install flake8 pytest twine pre-commit

    to test just the distributions() call I'm using the following:

    $ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
    20 loops, best of 20: 12.5 msec per loop
    $ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()'
    20 loops, best of 20: 36.7 msec per loop

    this is a less-extreme example, many applications have more dependencies installed -- but even in this case this is adding ~24ms startup to any application using entry_points() -- and it gets worse

    the return value of entry_points() alone isn't all that useful, next an application needs to retrieve its entry points. let's start for the somewhat normal case of retrieving a single category of entry points:

    $ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points()["flake8.extension"]'
    20 loops, best of 20: 12.7 msec per loop
    $ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points(name="flake8.extension")'
    20 loops, best of 20: 37.1 msec per loop
    $ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'entry_points().select(group="flake8.extension")'
    20 loops, best of 20: 37.1 msec per loop

    again, 3x slower and very real time to the end user (~24-25ms)

    now let's show an example usage that something like flake8 uses where multiple groups are requested (this is common for apps and plugin systems which provide multiple distinct functionalities)

    $ venv39/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps["flake8.extension"]; eps["flake8.report"]'
     
    20 loops, best of 20: 12.6 msec per loop
    $ venv310/bin/python -m timeit -n 20 -r 20 -s 'from importlib.metadata import entry_points' 'eps = entry_points(); eps.select(group="flake8.extension"); eps.select(group="flake8.report")'
    20 loops, best of 20: 38.2 msec per loop

    also slower, but an additional ms per call to .select(...)

    and it only gets worse with more and more packages installed

    here's the versions I'm using to ensure they are up to date:

    $ venv39/bin/python --version --version
    Python 3.9.5 (default, May 19 2021, 11:32:47) 
    [GCC 9.3.0]
    $ venv310/bin/python --version --version
    Python 3.10.0b2 (default, Jun  2 2021, 00:22:18) [GCC 9.3.0]

    @asottile asottile mannequin reopened this Jun 8, 2021
    @asottile asottile mannequin reopened this Jun 8, 2021
    @jaraco
    Copy link
    Member

    jaraco commented Jun 8, 2021

    As mentioned in msg394775, I'd like to decouple the performance concerns from the original incompatibility. I recognize that performance regressions are in their own way a form of incompatibility, but there have been a lot of changes to entry points with respect to performance, both prior to beta 1 and in beta 2, including changes that intentionally traded performance for correctness (python/importlib_metadata#281).

    To that end, I've filed python/importlib_metadata#324 to track the concerns.

    @jaraco jaraco closed this as completed Jun 8, 2021
    @jaraco jaraco closed this as completed Jun 8, 2021
    @asottile
    Copy link
    Mannequin Author

    asottile mannequin commented Jun 8, 2021

    they are directly coupled which is why I commented here

    the api redesign forces O(N) lookups and O(N) constructions which directly impact performance causing the regression

    @asottile asottile mannequin reopened this Jun 8, 2021
    @asottile asottile mannequin reopened this Jun 8, 2021
    @jaraco
    Copy link
    Member

    jaraco commented Jun 10, 2021

    In this Dockerfile, I've attempted to install Visual Studio, but without success. Docker fails to build on line 6. The install fails with this error:

    ERROR: Running ["C:\Users\ContainerAdministrator\AppData\Local\Temp\chocolatey\visualstudio2019buildtools\16.10.0.0\vs_BuildTools.exe" --quiet --add Microsoft.VisualStudio.Workload.ManagedDesktopBuildTools --add Microsoft.VisualStudio.Workload.NetCoreBuildTools --norestart --wait] was not successful. Exit code was '-2147024770'. See log for possible error messages.
    

    Any tips on creating a Windows environment with build support would be appreciated.

    @jaraco
    Copy link
    Member

    jaraco commented Jun 10, 2021

    I managed to put together a Dockerfile that seemingly has the build tools installed (https://github.com/jaraco/jaraco.windows/blob/d2edad2e2af9d469189d7ac6a14a4ba6f6270348/Dockerfile).

    When I attempt to build CPython, however, it fails with this error:

    PS C:\code\public\cpython> PCBuild\build.bat
    Using py -3.9 (found 3.9 with py.exe)
    Fetching external libraries...
    bzip2-1.0.6 already exists, skipping.
    sqlite-3.35.5.0 already exists, skipping.
    xz-5.2.2 already exists, skipping.
    zlib-1.2.11 already exists, skipping.
    Fetching external binaries...
    libffi already exists, skipping.
    openssl-bin-1.1.1k-1 already exists, skipping.
    tcltk-8.6.11.0 already exists, skipping.
    Finished.
    Using "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin\MSBuild.exe"  (found in the PATH)
    Using py -3.9 (found 3.9 with py.exe)
    
    C:\code\public\cpython>"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin\MSBuild.exe"  "C:\code\public\cpython\PCbuild\pcbuild.proj" /t:Build /m /nologo /v:m /clp:summary /p:Configuration=Release /p:Platform=x64 /p:IncludeExternals=true /p:IncludeCTypes=true /p:IncludeSSL=true /p:IncludeTkinter=true /p:UseTestMarker= /p:GIT="C:\Program Files\Git\cmd\git.exe"
    C:\code\public\cpython\PCbuild\python.props(111,31): error MSB4184: The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long. [C:\code\public\cpython\PCbuild\pythoncore.vcxproj]
    
    Build FAILED.
    
    C:\code\public\cpython\PCbuild\python.props(111,31): error MSB4184: The expression "[System.Version]::Parse('')" cannot be evaluated. Version string portion was too short or too long. [C:\code\public\cpython\PCbuild\pythoncore.vcxproj]
        0 Warning(s)
        1 Error(s)
    
    Time Elapsed 00:00:00.15
    

    @jaraco
    Copy link
    Member

    jaraco commented Jun 10, 2021

    The build error is tracked in bpo-43298.

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented Jun 18, 2021

    The test_entry_points_by_index test also fails on Fedora. See bpo-44451.

    @hectorizdaone
    Copy link
    Mannequin

    hectorizdaone mannequin commented Jun 18, 2021

    Addes aggreement

    @hectorizdaone hectorizdaone mannequin added build The build process and cross-build type-security A security issue labels Jun 18, 2021
    @jaraco
    Copy link
    Member

    jaraco commented Jun 27, 2021

    In python/importlib_metadata#324, I painstakingly created a library to perform robust performance monitoring of behaviors in the library and specifically added tests that capture the performance of entry_points against importlib_metadata 1.4 (the same code used in Python 3.9). I furthermore attempted to replicate the reported performance concerns reported using the exact commands, and even then, Python 3.10b3 indicates a 3x+ improvement in performance, dispelling the claim that the issue is order of complexity of the resolution of entry points by group.

    I'll be open to new reports about performance concerns or other regressions w.r.t. the importlib.metadata changes, but please do not re-open this issue.

    @jaraco jaraco closed this as completed Jun 27, 2021
    @jaraco jaraco closed this as completed Jun 27, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes build The build process and cross-build type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants