classification
Title: Create a pure Python zipfile/tarfile importer
Type: enhancement Stage: test needed
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, barry, brett.cannon, dmi.baranov, eric.smith, gregory.p.smith, jcea, pconnell, rhettinger
Priority: low Keywords: patch

Created on 2013-04-03 22:05 by brett.cannon, last changed 2015-04-13 20:40 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
zip_importlib.diff brett.cannon, 2013-06-21 01:09 review
zip_importlib.diff brett.cannon, 2013-06-21 02:11 Added support for bytecode loading review
Messages (12)
msg185967 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-03 22:05
I'm going to write an importer using zipfile and importlib in pure Python. I'm doing this so that (a) there is an importer that relies on zipfile and thus what features it adds over time, (b) to have good example code to point people to when they need to implement their own importers, and (c) for me to see where there are API holes in importlib and the ABCs in order to continue to try and make things easier for people.

Do realize that the last point means I will most likely be writing this myself no matter what people contribute (sorry). If people still want to tackle it to provide feedback on what importlib is lacking that's totally fine and welcome feedback, but since this is partially an exercise for myself I feel like I do need to type it all out from scratch to discover sticking points, even from an experienced perspective. But I'm happy to discuss design choices, etc. here with folks as this progresses.
msg185968 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-03 22:06
I should mention I have an old implementation at https://code.google.com/p/importers/.
msg186002 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-04-04 05:48
Thanks Brett.  I look forward to this.
msg191546 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-21 01:09
Here is an initial stab at a zip file importer using importlib. Probably the biggest shortcoming is that it doesn't support bytecode files, but that's because I just have not bothered to add support yet (it's just one method to implement). There is a note in zipimport that the resolution does not match up between zip file modification times and what bytecode files store and so there needs to be a one second fuzzing factor but I'm not seeing why based on the fact that os.stat().st_mtime is used by bytecode files which has a one second resolution already like zip files. The other shortcoming is bytecode-only files are not supported (on purpose as importlib.abc.SourceLoader doesn't support bytecode-only files).
msg191554 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-21 02:11
I went ahead and added the needed method for bytecode loading; see the new patch.
msg191558 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-06-21 06:48
Times in a ZIP files have a two-seconds resolution (in the old DOS FAT format: 5 bits for the hours, 6 bits for the minutes, and only 5 bits left for the seconds)
Some fuziness logic is needed when comparing against a time_t.
msg191580 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-21 13:48
A two second granularity? Ugh. That will require a new (possibly private) API to abstract that out in order to handle that case. What a pain. While I look into that if people can look at the code and let me know if they see anything wrong with it, hard to understand, or a way to improve I would appreciate it.
msg192140 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-07-01 16:58
Paul's review did find one non-optimal thing the code is doing. I'll fix it the next time there's a need to upload a new patch.
msg202446 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-11-08 20:35
This is going to need a touch-up for PEP 451 to make the whole thing work more smoothly, so repositioning for Python 3.5. When that happens I can look into the 2 second issue that Amaury pointed out and Paul's review.

Although I do wonder about the utility of this work. Zipimport still gets around the bootstrapping problem that any pure Python zip implementation never will. Probably the only hope of being useful is to make this abstract enough to include a pluggable (de)compression setup so that lzma or bz2 can be used on top of a single file storage back-end like tar or zip.
msg208829 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-01-22 17:06
Re-positioning to work with both tarfile and zipfile since tarfile's 'r' will transparently decompress as necessary. Might need to scale back some functionality to make it easily work with both formats. But since are both alternative storage solutions then some generic base classes should be possible (which might require some refactoring anyway to make it all abstract in comparison to the storage mechanism.
msg212329 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-02-26 23:33
The more I think about this, the less useful it seems to be. We need zipimport written in C for bootstrapping issues. If that's the case then time should be put into making that work and not duplicating the functionality. I'm going to leave this open but unassign from myself as I would rather focus on my lazy loader issue than this. If someone can manage to create a zipfile importer whose dependencies are small and thus can be frozen along with importlib then this can be moved forward.
msg240730 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-04-13 20:21
Based on our hallway pow-wow at PyCon 2015 sprints day #1...  I audited the zipfile module to confirm our suspicions about it being "large".

In current Python 3.5 head's zipfile.py module here are the things it depends directly upon from other modules:

import io  # [Py(C: _io)] io.BufferedIOBase{,readline}, io.BytesIO, io.open
import os  # [Py(C: various)] os.path.*, os.getcwd, os.stat, os.listdir, curdir, pardir, 
import re  # [Py(C: sre_*)] Only used for universal newlines in ZipExtFile.readline().  Shouldn't the io module do this part for us?!?
import importlib.util  # [Py] importib.util.cache_from_source() from PyZipFile to create importable .zip files.
import sys  # [C] sys.platform for creation metadata, sys.stderr for a strange print on a DeprecationWarning.
import time  # [C] time.localtime, time.time
import stat  # [Py] stat.filemode, stst.S_ISDIR
import shutil  # [Py] shutil.copyfileobj() from _extract_member().
import struct  # [Py(C: _struct)] struct.pack(), struct.unpack(), struct.calcsize()
import binascii  # [C] binascii.crc32() only if zlib is unavailable. (store only zip support?)
import threading  # [Py] threading.RLock() in ZipFile.

import zlib, bz2, lzma are all conditional.  # [C]

import warnings  # [Py] conditional import to highlight legacy uses.
import py_compile  # [Py] conditional import in PyZipFile for importable .zip file creation.

Some of these are obviously shims around C extensions which could conditionally be used directly if desired.  But many others are largely implemented in Python.  Freezing all of these just to use the bloated zipfile.py within a pure Python zipimport implementation seems like extreme overkill.  Not worth the effort.

Efforts for improved zipimport are now likely to focus on a simple C zip file reading-only library being used by a new clean implementation of a zip importer on top of that.
History
Date User Action Args
2015-04-13 20:40:57gregory.p.smithsetstatus: open -> closed
resolution: rejected
2015-04-13 20:21:12gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg240730
2014-02-26 23:33:04brett.cannonsetassignee: brett.cannon ->
messages: + msg212329
2014-01-22 17:06:39brett.cannonsetmessages: + msg208829
title: Create a pure Python zipfile importer -> Create a pure Python zipfile/tarfile importer
2013-11-08 20:35:56brett.cannonsetmessages: + msg202446
versions: + Python 3.5, - Python 3.4
2013-07-01 16:58:49brett.cannonsetmessages: + msg192140
2013-06-21 13:48:38brett.cannonsetmessages: + msg191580
2013-06-21 06:48:15amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg191558
2013-06-21 02:11:23brett.cannonsetfiles: + zip_importlib.diff

messages: + msg191554
2013-06-21 01:09:32brett.cannonsetfiles: + zip_importlib.diff
keywords: + patch
messages: + msg191546
2013-04-30 21:43:24dmi.baranovsetnosy: + dmi.baranov
2013-04-04 05:48:58rhettingersetnosy: + rhettinger
messages: + msg186002
2013-04-04 03:15:44jceasetnosy: + jcea
2013-04-04 01:34:04eric.smithsetnosy: + eric.smith
2013-04-03 22:26:16barrysetnosy: + barry
2013-04-03 22:13:22pconnellsetnosy: + pconnell
2013-04-03 22:06:43brett.cannonsetmessages: + msg185968
2013-04-03 22:05:56brett.cannoncreate