Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite zipimport from scratch #69897

Closed
brettcannon opened this issue Nov 23, 2015 · 33 comments
Closed

Rewrite zipimport from scratch #69897

brettcannon opened this issue Nov 23, 2015 · 33 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@brettcannon
Copy link
Member

BPO 25711
Nosy @Yhg1s, @warsaw, @brettcannon, @gpshead, @ncoghlan, @pmp-p, @ericsnowcurrently, @serhiy-storchaka, @dianaclarke, @AraHaan, @jarondl
PRs
  • bpo-25711: zipimport rewritten in Python #4023
  • bpo-25711: Rewrite zipimport in pure Python. #6809
  • bpo-25711: Remove outdated zipimport tests. #9404
  • bpo-25711: Move _ZipImportResourceReader from importlib to zipimport. #9406
  • bpo-32075: Expose ZipImporter Type Object in the include header files. #4470
  • Files
  • zipimport.patch
  • zipimport-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-11-05.13:48:52.968>
    created_at = <Date 2015-11-23.16:28:40.454>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Rewrite zipimport from scratch'
    updated_at = <Date 2018-11-05.13:48:52.967>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2018-11-05.13:48:52.967>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-11-05.13:48:52.968>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2015-11-23.16:28:40.454>
    creator = 'brett.cannon'
    dependencies = []
    files = ['45800', '45814']
    hgrepos = []
    issue_num = 25711
    keywords = ['patch']
    message_count = 33.0
    messages = ['255183', '255198', '255202', '255224', '256161', '256171', '256207', '257360', '257609', '282731', '282776', '282777', '282871', '282876', '282938', '283130', '299705', '304484', '304531', '306556', '306610', '307524', '316536', '321267', '321289', '323750', '324445', '324453', '324456', '325672', '325706', '325771', '329294']
    nosy_count = 13.0
    nosy_names = ['twouters', 'barry', 'brett.cannon', 'gregory.p.smith', 'ncoghlan', 'pmpp', 'eric.snow', 'serhiy.storchaka', 'diana', 'superluser', 'byrnes', 'Decorater', 'jarondl']
    pr_nums = ['4023', '6809', '9404', '9406', '4470']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue25711'
    versions = ['Python 3.8']

    @brettcannon
    Copy link
    Member Author

    No one wants to work on zipimport, and yet it's full of bugs. It needs a rewrite so that it's more maintainable. An idea floated at PyCon 2015 was to writing the zip-reading code in C and to keep it as simple as possible -- e.g., don't worry about supporting comments, etc. -- and then write the rest of the code in importlib, making maintenance much easier.

    All of the various zipimport bugs should be made dependent on this issue as unless they are critical flaws I doubt they will get fixed without the rewrite.

    @brettcannon brettcannon added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 23, 2015
    @serhiy-storchaka
    Copy link
    Member

    FYI I'm at the early stage of rewriting zipimport in Python.

    @brettcannon
    Copy link
    Member Author

    Are you writing it in such a way that it can be bootstspped in with importlib so the stslib can be loaded from it?

    @serhiy-storchaka
    Copy link
    Member

    It was my intention.

    @superluser
    Copy link
    Mannequin

    superluser mannequin commented Dec 9, 2015

    Serhiy, how far along are you on this? I have a wip from this summer that I could finish over the holidays.

    @serhiy-storchaka
    Copy link
    Member

    I were on very early stage, and stopped this work few weeks ago in favor of other issues. I would be glad to make a review of your work when you have finished it Rose.

    @vstinner
    Copy link
    Member

    Can you both publish your WIP work?

    @brettcannon
    Copy link
    Member Author

    What are people's statuses on their various attempts? Since this is going to block my importlib.resources work I will do the work myself or work directly with someone in order to make sure this gets done.

    @superluser
    Copy link
    Mannequin

    superluser mannequin commented Jan 6, 2016

    Sorry for the late response. I didn't have much time over the holidays. I think I better let someone else take this one.

    @serhiy-storchaka
    Copy link
    Member

    Here is preliminary translation of zipimport to Python. It is not frozen and imports other modules. I tried to keep the implementation close to C implementation. As a consequence, some raised exceptions look arbitrary.

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life type-feature A feature request or enhancement labels Dec 8, 2016
    @serhiy-storchaka
    Copy link
    Member

    Got rid of dependencies from os, stat and encodings.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2016

    Here is preliminary translation of zipimport to Python. It is not frozen and imports other modules.

    Technically, will it be possible to freeze it? It seems useful to keep the ability to put the whole stdlib into a single ZIP. Using a ZIP is sometimes suggested to avoid fstat() on disk for example, to speedup Python startup.

    But I also understand that the C code is painful to maintain and update.

    Anyway, nice job Serhiy :-)

    @gpshead
    Copy link
    Member

    gpshead commented Dec 10, 2016

    So long as this code block that imports os is avoided, I believe that this can be properly frozen:

    + if not isinstance(path, str):
    + import os
    + path = os.fsdecode(path)

    But it should be easy to avoid that code path when the standard library is a zip file.

    Otherwise it uses importlib (frozen), marshal (builtin), sys (builtin), time (builtin), and zlib [if present] (extension module).

    @serhiy-storchaka
    Copy link
    Member

    Technically, will it be possible to freeze it?

    I think yes. But I don't know how to do this. I hope on Brett's (or other import machinery expert) help.

    Since zipimporter constructor is called only with string path by import machinery, the os module is not imported at initialization stage.

    @brettcannon
    Copy link
    Member Author

    And having a private copy of os.fsdecode() isn't difficult as os.fspath() is in posix and after that it's four lines that only need access to the sys module.

    @brettcannon
    Copy link
    Member Author

    Just FYI, if this lands I will probably try to build off of it to make a pathlib-like zip module to eventually replace zipfile. So if there's any API design decisions that need to be made, it would be great if we try to keep the zip-specific bits separate and generic enough to work with in other future libraries.

    @jarondl
    Copy link
    Mannequin

    jarondl mannequin commented Aug 3, 2017

    What is the status of this work? Is there anything I can do to help make this happen?

    @warsaw
    Copy link
    Member

    warsaw commented Oct 16, 2017

    I've landed here after chatting with @brett.cannon. I have a use case for this (making pex startup faster by bypassing pkg_resources) but I need to hack around the limitation of dlopen'ing .so's from zips. Our idea was to have a zipimport subclass which doesn't return None from importlib.util.find_spec() when it finds a .so, but instead dumps that into some safe directory, and then arranges for a loader that knows how to load that. It sure would be handy for this to be a zipimporter subclass. :)

    I think Serhiy's patch predates the move to GitHub, so it's not a branch/PR. I guess the next step would be to branchify the patch and then continue discussion over there. Depending on my availability, I might do that.

    @warsaw
    Copy link
    Member

    warsaw commented Oct 17, 2017

    I've ported the patch to a branch on GH. See PR 4023. I started from zipimport-2.patch. It doesn't work, but it will be easier to work on as a PR rather than a patch.

    Contributions welcome! Let's see if we can make this work.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Nov 20, 2017

    So, after reviewing this it started to make me rethink about the zipimport.py file.

    So my question is how would that file work if it is an pyc file in python37.zip or something just to zipimport other modules? There is got to be some sort of low level api that can zip import the zip importer then on the rewrite. Am I right?

    Maybe the best bet is to wait for bug reports on the C Code and fixup the C Code if possible so that way there is no conflicts like the ones I just questioned.

    @brettcannon
    Copy link
    Member Author

    zipimport.py would be frozen just like importlib, so there's no bootstrapping issue if that's what you're asking.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Dec 3, 2017

    Alright, just wanted to make sure because I did not want to have something break when loading up the entire standard library from an zip file with it.

    @serhiy-storchaka
    Copy link
    Member

    PR 6809 freezes zipimport.py, adds changes made in zipimport.c since writing the initial Python implementation, and fixes few bugs exposed in the frozen module. Seems Python now works with the zipped stdlib.

    @serhiy-storchaka
    Copy link
    Member

    Could anybody please make a review? This is just a first step, we need to do it before making other steps: implementing more modern import API, supporting large ZIP files and ZIP files with comments, implementing loading binary extensions from ZIP files, etc.

    @brettcannon
    Copy link
    Member Author

    I'm planning to review the PR at some point.

    On Sun, Jul 8, 2018, 00:42 Serhiy Storchaka, <report@bugs.python.org> wrote:

    Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

    Could anybody please make a review? This is just a first step, we need to
    do it before making other steps: implementing more modern import API,
    supporting large ZIP files and ZIP files with comments, implementing
    loading binary extensions from ZIP files, etc.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue25711\>


    @serhiy-storchaka
    Copy link
    Member

    Fixed issues on Windows.

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Aug 19, 2018
    @serhiy-storchaka serhiy-storchaka removed the 3.7 (EOL) end of life label Aug 19, 2018
    @brettcannon
    Copy link
    Member Author

    I think Serhiy's PR is basically done, so now the question is do we want to merge it in and drop the C code? ;)

    I obviously say yes because this is I/O-bound code and so the switch shouldn't be enough of a performance hit to warrant blocking the gain in maintainability long-term (especially if we try to clean the module up slowly).

    Barry, since I know you work with zip files a lot at work did you want to check to make sure perf won't be an issue for your use-case?

    @warsaw
    Copy link
    Member

    warsaw commented Sep 1, 2018

    Not sure if I'll have time before the core sprints to check the implementation with shiv, but I can give it a try then. Do you mind waiting until then?

    @brettcannon
    Copy link
    Member Author

    I'm personally in no rush and I assume Serhiy isn't either with 3.8 cut-off quite a ways out, so I see no rush.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 79d1c2e by Serhiy Storchaka in branch 'master':
    bpo-25711: Rewrite zipimport in pure Python. (GH-6809)
    79d1c2e

    @serhiy-storchaka
    Copy link
    Member

    New changeset 9da3961 by Serhiy Storchaka in branch 'master':
    bpo-25711: Move _ZipImportResourceReader from importlib to zipimport. (GH-9406)
    9da3961

    @serhiy-storchaka
    Copy link
    Member

    New changeset b2984ab by Serhiy Storchaka in branch 'master':
    bpo-25711: Remove outdated zipimport tests. (GH-9404)
    b2984ab

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Nov 5, 2018

    Noticed this was still open when reviewing Elvis's zipimport patch for bpo-34022.

    Given the Python implementation has been merged, should we close this as resolved, and open new issues for any further changes (performance or otherwise)?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants