Title: Rewrite zipimport from scratch
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Decorater, barry, brett.cannon, byrnes, diana, eric.snow, gregory.p.smith, jarondl, serhiy.storchaka, superluser, twouters, vstinner
Priority: high Keywords: patch

Created on 2015-11-23 16:28 by brett.cannon, last changed 2017-12-03 20:57 by Decorater.

File name Uploaded Description Edit
zipimport.patch serhiy.storchaka, 2016-12-08 19:46 review
zipimport-2.patch serhiy.storchaka, 2016-12-09 12:34 review
Pull Requests
URL Status Linked Edit
PR 4023 closed barry, 2017-10-17 20:43
Messages (22)
msg255183 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-23 16:28
No one wants to work on zipimport, and yet it's full of bugs. It needs a rewrite so that it's more maintainable. An idea floated at PyCon 2015 was to writing the zip-reading code in C and to keep it as simple as possible -- e.g., don't worry about supporting comments, etc. -- and then write the rest of the code in importlib, making maintenance much easier.

All of the various zipimport bugs should be made dependent on this issue as unless they are critical flaws I doubt they will get fixed without the rewrite.
msg255198 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-23 17:15
FYI I'm at the early stage of rewriting zipimport in Python.
msg255202 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-23 17:23
Are you writing it in such a way that it can be bootstspped in with importlib so the stslib can be loaded from it?
msg255224 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-23 20:41
It was my intention.
msg256161 - (view) Author: Rose Ames (superluser) * Date: 2015-12-09 21:12
Serhiy, how far along are you on this?  I have a wip from this summer that I could finish over the holidays.
msg256171 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-10 08:40
I were on very early stage, and stopped this work few weeks ago in favor of other issues. I would be glad to make a review of your work when you have finished it Rose.
msg256207 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-12-11 08:25
Can you both publish your WIP work?
msg257360 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-02 20:29
What are people's statuses on their various attempts? Since this is going to block my importlib.resources work I will do the work myself or work directly with someone in order to make sure this gets done.
msg257609 - (view) Author: Rose Ames (superluser) * Date: 2016-01-06 14:41
Sorry for the late response.  I didn't have much time over the holidays.  I think I better let someone else take this one.
msg282731 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-08 19:46
Here is preliminary translation of zipimport to Python. It is not frozen and imports other modules. I tried to keep the implementation close to C implementation. As a consequence, some raised exceptions look arbitrary.
msg282776 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-09 12:34
Got rid of dependencies from os, stat and encodings.
msg282777 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-09 13:16
> Here is preliminary translation of zipimport to Python. It is not frozen and imports other modules.

Technically, will it be possible to freeze it? It seems useful to keep the ability to put the whole stdlib into a single ZIP. Using a ZIP is sometimes suggested to avoid fstat() on disk for example, to speedup Python startup.

But I also understand that the C code is painful to maintain and update.

Anyway, nice job Serhiy :-)
msg282871 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2016-12-10 20:14
So long as this code block that imports os is avoided, I believe that this can be properly frozen:

+        if not isinstance(path, str):
+            import os
+            path = os.fsdecode(path)

But it should be easy to avoid that code path when the standard library is a zip file.

Otherwise it uses importlib (frozen), marshal (builtin), sys (builtin), time (builtin), and zlib [if present] (extension module).
msg282876 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-10 21:16
> Technically, will it be possible to freeze it?

I think yes. But I don't know how to do this. I hope on Brett's (or other import machinery expert) help.

Since zipimporter constructor is called only with string path by import machinery, the os module is not imported at initialization stage.
msg282938 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-12-11 19:06
And having a private copy of os.fsdecode() isn't difficult as os.fspath() is in posix and after that it's four lines that only need access to the sys module.
msg283130 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-12-13 17:51
Just FYI, if this lands I will probably try to build off of it to make a pathlib-like zip module to eventually replace zipfile. So if there's any API design decisions that need to be made, it would be great if we try to keep the zip-specific bits separate and generic enough to work with in other future libraries.
msg299705 - (view) Author: Yaron de Leeuw (jarondl) * Date: 2017-08-03 13:04
What is the status of this work? Is there anything I can do to help make this happen?
msg304484 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-10-16 21:42
I've landed here after chatting with @brett.cannon.  I have a use case for this (making pex startup faster by bypassing pkg_resources) but I need to hack around the limitation of dlopen'ing .so's from zips.  Our idea was to have a zipimport subclass which doesn't return None from `importlib.util.find_spec()` when it finds a .so, but instead dumps that into some safe directory, and then arranges for a loader that knows how to load that.  It sure would be handy for this to be a zipimporter subclass. :)

I think Serhiy's patch predates the move to GitHub, so it's not a branch/PR.  I guess the next step would be to branchify the patch and then continue discussion over there.  Depending on my availability, I might do that.
msg304531 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-10-17 20:44
I've ported the patch to a branch on GH.  See PR 4023.  I started from zipimport-2.patch.  It doesn't work, but it will be easier to work on as a PR rather than a patch.

Contributions welcome!  Let's see if we can make this work.
msg306556 - (view) Author: Decorater (Decorater) * Date: 2017-11-20 15:37
So, after reviewing this it started to make me rethink about the file.

So my question is how would that file work if it is an pyc file in or something just to zipimport other modules? There is got to be some sort of low level api that can zip import the zip importer then on the rewrite. Am I right?

Maybe the best bet is to wait for bug reports on the C Code and fixup the C Code if possible so that way there is no conflicts like the ones I just questioned.
msg306610 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2017-11-21 01:45 would be frozen just like importlib, so there's no bootstrapping issue if that's what you're asking.
msg307524 - (view) Author: Decorater (Decorater) * Date: 2017-12-03 20:57
Alright, just wanted to make sure because I did not want to have something break when loading up the entire standard library from an zip file with it.
Date User Action Args
2017-12-03 20:57:06Decoratersetmessages: + msg307524
2017-11-21 01:45:15brett.cannonsetmessages: + msg306610
2017-11-20 15:37:35Decoratersetnosy: + Decorater
messages: + msg306556
2017-10-17 20:44:56barrysetmessages: + msg304531
2017-10-17 20:43:40barrysetpull_requests: + pull_request3998
2017-10-16 21:42:31barrysetmessages: + msg304484
2017-10-16 21:38:31barrysetnosy: + barry
2017-08-03 13:04:19jarondlsetnosy: + jarondl
messages: + msg299705
2016-12-13 17:51:29brett.cannonsetmessages: + msg283130
2016-12-11 19:06:14brett.cannonsetmessages: + msg282938
2016-12-10 21:16:17serhiy.storchakasetmessages: + msg282876
2016-12-10 20:14:24gregory.p.smithsetmessages: + msg282871
2016-12-09 13:16:50vstinnersetmessages: + msg282777
2016-12-09 12:34:56serhiy.storchakasetfiles: + zipimport-2.patch

messages: + msg282776
2016-12-08 19:46:09serhiy.storchakasetfiles: + zipimport.patch
versions: + Python 3.7, - Python 3.6
messages: + msg282731

keywords: + patch
type: enhancement
stage: patch review
2016-01-06 14:41:28superlusersetmessages: + msg257609
2016-01-05 05:48:21dianasetnosy: + diana
2016-01-02 20:29:36brett.cannonsetmessages: + msg257360
2015-12-11 08:25:08vstinnersetnosy: + vstinner
messages: + msg256207
2015-12-10 08:40:21serhiy.storchakasetmessages: + msg256171
2015-12-09 21:12:19superlusersetmessages: + msg256161
2015-11-23 20:41:02serhiy.storchakasetmessages: + msg255224
2015-11-23 20:40:42serhiy.storchakasetmessages: - msg255223
2015-11-23 20:40:34serhiy.storchakasetmessages: + msg255223
2015-11-23 20:40:09serhiy.storchakasetmessages: - msg255222
2015-11-23 20:39:50serhiy.storchakasetmessages: + msg255222
2015-11-23 20:33:14eric.snowsetnosy: + eric.snow, superluser
2015-11-23 17:23:42brett.cannonsetmessages: + msg255202
2015-11-23 17:15:22serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg255198
2015-11-23 16:36:10byrnessetnosy: + byrnes
2015-11-23 16:29:06brett.cannonlinkissue25710 dependencies
2015-11-23 16:28:40brett.cannoncreate