Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port importlib_metadata to Python 3.8 #78813

Closed
warsaw opened this issue Sep 11, 2018 · 20 comments
Closed

Port importlib_metadata to Python 3.8 #78813

warsaw opened this issue Sep 11, 2018 · 20 comments
Assignees
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@warsaw
Copy link
Member

warsaw commented Sep 11, 2018

BPO 34632
Nosy @warsaw, @brettcannon, @rhettinger, @jaraco, @vstinner, @yan12125, @a-recknagel
PRs
  • bpo-34632: Add the importlib.metadata module #9327
  • bpo-34632: Add importlib.metadata #12547
  • bpo-34632: fix installation of importlib.metadata #13563
  • bpo-34632: fix test_importlib with installed CPython #13565
  • bpo-34632 fix buildbots and remove artifact #13566
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/warsaw'
    closed_at = <Date 2019-09-12.11:00:18.434>
    created_at = <Date 2018-09-11.18:16:43.409>
    labels = ['3.8', 'library']
    title = 'Port importlib_metadata to Python 3.8'
    updated_at = <Date 2019-10-11.17:07:27.248>
    user = 'https://github.com/warsaw'

    bugs.python.org fields:

    activity = <Date 2019-10-11.17:07:27.248>
    actor = 'arne'
    assignee = 'barry'
    closed = True
    closed_date = <Date 2019-09-12.11:00:18.434>
    closer = 'jaraco'
    components = ['Library (Lib)']
    creation = <Date 2018-09-11.18:16:43.409>
    creator = 'barry'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34632
    keywords = ['patch']
    message_count = 20.0
    messages = ['325043', '343440', '343441', '343445', '343457', '343459', '343460', '343461', '343462', '343465', '343480', '343481', '343483', '349415', '349416', '349421', '349423', '352105', '354466', '354469']
    nosy_count = 7.0
    nosy_names = ['barry', 'brett.cannon', 'rhettinger', 'jaraco', 'vstinner', 'yan12125', 'arne']
    pr_nums = ['9327', '12547', '13563', '13565', '13566']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34632'
    versions = ['Python 3.8']

    @warsaw
    Copy link
    Member Author

    warsaw commented Sep 11, 2018

    https://importlib_metadata.rtfd.org

    We're fleshing out the API and implementation in the standalone library, but once we're confident of the API and semantics, we will want to port this into Python 3.8.

    @warsaw warsaw added the 3.8 only security fixes label Sep 11, 2018
    @warsaw warsaw self-assigned this Sep 11, 2018
    @warsaw warsaw added the stdlib Python modules in the Lib dir label Sep 11, 2018
    @warsaw
    Copy link
    Member Author

    warsaw commented May 24, 2019

    New changeset 1bbf7b6 by Barry Warsaw (Jason R. Coombs) in branch 'master':
    bpo-34632: Add importlib.metadata (GH-12547)
    1bbf7b6

    @warsaw
    Copy link
    Member Author

    warsaw commented May 24, 2019

    Thanks @jaraco! This is now merged into 3.8.

    @warsaw warsaw closed this as completed May 24, 2019
    @vstinner
    Copy link
    Member

    Unhappy buildbot: AMD64 Fedora Rawhide Clang Installed 3.x

    https://buildbot.python.org/all/#/builders/188/builds/302

    Example:

    0:00:28 load avg: 4.02 [182/422/1] test_importlib failed
    Failed to import test module: test.test_importlib.test_main
    Traceback (most recent call last):
      File "/home/buildbot/buildarea/3.x.cstratak-fedora.installed/build/target/lib/python3.8/unittest/loader.py", line 436, in _find_test_path
        module = self._get_module_from_name(name)
      File "/home/buildbot/buildarea/3.x.cstratak-fedora.installed/build/target/lib/python3.8/unittest/loader.py", line 377, in _get_module_from_name
        __import__(name)
      File "/home/buildbot/buildarea/3.x.cstratak-fedora.installed/build/target/lib/python3.8/test/test_importlib/test_main.py", line 6, in <module>
        import importlib.metadata
    ModuleNotFoundError: No module named 'importlib.metadata'

    @vstinner vstinner reopened this May 25, 2019
    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented May 25, 2019

    I got the same ModuleNotFoundError on Arch Linux and #13563 fixes it. I believe it can fix the issue on Fedora buildbots, too.

    @jaraco
    Copy link
    Member

    jaraco commented May 25, 2019

    I started trying to replicate the failure. I got as far as this Dockerfile:

    FROM fedora:rawhide
    
    RUN yum install -y clang make git
    
    RUN git clone https://github.com/python/cpython
    WORKDIR cpython
    RUN ./configure
    RUN make
    

    And then running ./python [Tools/scripts/run_tests.py](https://github.com/python/cpython/blob/main/Tools/scripts/run_tests.py) test_importlib, but the tests fail due to zlib not being installed.

    Sounds like yan12125 has the fix, so I'm shelving my investigation.

    @jaraco
    Copy link
    Member

    jaraco commented May 25, 2019

    New changeset c3738cf by Jason R. Coombs (Chih-Hsuan Yen) in branch 'master':
    bpo-34632: fix installation of importlib.metadata (bpo-13563)
    c3738cf

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented May 25, 2019

    Oops apparently my fix is incomplete. From the builder "AMD64 Fedora Rawhide Clang Installed 3.x" [1]:

    ModuleNotFoundError: No module named 'test.test_importlib.data'

    [1] https://buildbot.python.org/all/api/v2/logs/824407/raw

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented May 25, 2019

    By the way, I think Python.framework is not needed? 1bbf7b6#diff-206dc381e448d5121da9a6040a2b13c1

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented May 25, 2019

    I managed to create a setup similar to the buildbot builder "AMD64 Fedora Rawhide Clang Installed 3.x" [1] on Arch Linux. Running test_importlib on an installed CPython copy is fine now:

    $ /usr/bin/python3.8 -m test.regrtest test_importlib
    Run tests sequentially
    0:00:00 load avg: 0.14 [1/1] test_importlib

    == Tests result: SUCCESS ==

    1 test OK.

    Total duration: 1 sec 288 ms
    Tests result: SUCCESS

    I apologize for not checking things carefully and misunderstanding the issue on "AMD64 Fedora Rawhide Clang Installed 3.x".

    [1] https://github.com/python/buildmaster-config/blob/master/master/custom/factories.py

    @jaraco
    Copy link
    Member

    jaraco commented May 25, 2019

    By the way, I think Python.framework is not needed?

    Correct. That was an artifact that I unintentionally added.

    I've submitted #13566 to address the two concerns.

    I've also opened bpo-37043 and bpo-37044 to address the causes of these emergent failures.

    @jaraco
    Copy link
    Member

    jaraco commented May 25, 2019

    New changeset f7fba6c by Jason R. Coombs in branch 'master':
    bpo-34632 fix buildbots and remove artifact (GH-13566)
    f7fba6c

    @jaraco
    Copy link
    Member

    jaraco commented May 25, 2019

    I believe buildbots are fixed. Please re-open if you find otherwise.

    @jaraco jaraco closed this as completed May 25, 2019
    @rhettinger
    Copy link
    Contributor

    Quick question: Is there a reason that requires() and files() return iterators instead of lists? ISTM that a list-based solution would be more usable than returning a starmap() object or somesuch. I suspect almost every user would have to call list(files(package)) rather than files(package). An iterator return type would only make sense if we need the values are produces lazily or if a known consumer required an iterator input.

    Also consider changing the parameter from files(package) to files(package_name). When I first tried-out this API, I typed: "import requests; files(requests)" instead of "files('requests')".

    Sorry to bring this up at a late stage, but the purpose of a beta release is to let other users try-out the API while there is still a chance to make adjustments.

    @rhettinger rhettinger reopened this Aug 11, 2019
    @warsaw
    Copy link
    Member Author

    warsaw commented Aug 11, 2019

    @jaraco will be able to answer that better than me. I actually thought those did return concrete lists.

    I also thought that the APIs accepted either a module or a package name, but maybe I'm thinking about importlib.resources. Again, @jaraco can clarify, but I think the problem is that there's no unambiguous mapping between packages and package names for metadata the way there is for resources.

    @jaraco
    Copy link
    Member

    jaraco commented Aug 12, 2019

    Is there a reason that requires() and files() return iterators instead of lists?

    I'm a huge fan of itertools and Python 3's move to prefer iterables over materialized lists, and I feel that forcing materialized results gives the caller less control over the results.

    Following the same pattern that many standard Python objects return (open, map, filter), the approach is less constrained in that it can support arbitrarily large results. I wished to leave it to the caller to materialize a list if that was needed and my assumption was that 90% of the use-cases would be iterating over the results once.

    I also thought that the APIs accepted either a module or a package name,

    Early on, I had hoped to have the API accept either the distribution package name or a Python package... and I even started creating a protocol for package vendors to provide a reference from their module or package back to the distribution package name. But I decided that approach was to invasive and unlikely to get widespread support, but also that it added little value.

    What importlib really works with is distribution packages (also known as Projects in PyPI) and not Python packages... and it works at an earlier abstraction (often you want to know metadata about a package without importing it).

    Also consider changing the parameter from files(package) to files(package_name).

    I think at one point, the parameter name was distribution_name_or_package. We removed the acceptance of packages, but then renamed the parameter to 'package' for brevity. This parameter is used in many functions (files, requires, version, metadata, distribution). We'd want to change it in all of those. Once it becomes a parameter of the Distribution class (such as in Distribution.from_name), the 'distribution' is implied, so 'name' is clear enough. I do try to avoid long and multi-word parameters when possible. Perhaps more appropriate would be 'distribution_name' or 'dist_name'.

    I'm leaning toward 'dist_name' right now. What do you think?

    @rhettinger
    Copy link
    Contributor

    Following the same pattern that many standard Python objects
    return (open, map, filter), the approach is less
    constrained in that it can support arbitrarily large results.
    I wished to leave it to the caller to materialize a list if
    that was needed and my assumption was that 90% of the use-cases
    would be iterating over the results once.

    My recommendation is to return a concrete list. Shifting the responsibility to the user makes the API less convenient, particularly if someone is running this at the interactive prompt and just wants to see the results.

    We replaced the list version of map() with itertools.imap() for memory efficiency with potentially enormous or infinite inputs. However, this came at a cost for usability. In every single course where I present map() there is an immediate stumble over the need to wrap it in list() just to see the output. In general, we should save the iterators for cases where there is real value in lazy execution. Otherwise, usability, inspectability, sliceability, and re-iterability wins. (Just think of how awkward it would be if dir() or os.listdir() returned an iterator instead of a list.)

    FWIW, the doc string for requires() is:

    Return a list of requirements for the indicated distribution

    Perhaps more appropriate would be 'distribution_name' or 'dist_name'.

    I recommend 'distribution_name'. It will normally be used as a positional parameter but the full name will show-up in the tool tips, providing guidance on how to use it.

    When you get a chance, please look at #15204
    where I'm presenting your creation to the world.

    Overall, I think this was a nice addition to the standard library. Thanks for your work :-)

    @jaraco
    Copy link
    Member

    jaraco commented Sep 12, 2019

    I've addressed the requests made by rhettinger in bpo-38086 and bpo-38121.

    @jaraco jaraco closed this as completed Sep 12, 2019
    @a-recknagel
    Copy link
    Mannequin

    a-recknagel mannequin commented Oct 11, 2019

    Is there a reason the object returned by importlib.metadata.metadata is an EmailMessage and not a dict? If it quacks like a duck it should be a duck, no?

    @a-recknagel
    Copy link
    Mannequin

    a-recknagel mannequin commented Oct 11, 2019

    I just learned that metadata is stored as an email, and changing the format was rejected in PEP-426. Be that as it may, if it isn't too much of an issue it might still be something that should be hidden from users of the module. Noone wants to know that this particular duck is actually powered by fins under the surface, right?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants