Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the ZipFile Interface #35279

Closed
anonymous mannequin opened this issue Oct 4, 2001 · 12 comments
Closed

Improve the ZipFile Interface #35279

anonymous mannequin opened this issue Oct 4, 2001 · 12 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@anonymous
Copy link
Mannequin

anonymous mannequin commented Oct 4, 2001

BPO 467924
Nosy @birkenfeld
Files
  • zipfile_extract.diff: Add extract/extractall methods to ZipFile
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-01-07.18:48:55.687>
    created_at = <Date 2001-10-04.15:54:14.000>
    labels = ['type-feature', 'library']
    title = 'Improve the ZipFile Interface'
    updated_at = <Date 2008-01-07.18:48:55.685>
    user = 'https://bugs.python.org/anonymous'

    bugs.python.org fields:

    activity = <Date 2008-01-07.18:48:55.685>
    actor = 'georg.brandl'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-01-07.18:48:55.687>
    closer = 'georg.brandl'
    components = ['Library (Lib)']
    creation = <Date 2001-10-04.15:54:14.000>
    creator = 'anonymous'
    dependencies = []
    files = ['9066']
    hgrepos = []
    issue_num = 467924
    keywords = []
    message_count = 12.0
    messages = ['53289', '53290', '53291', '53292', '53293', '53294', '53295', '57981', '57990', '59269', '59272', '59476']
    nosy_count = 7.0
    nosy_names = ['georg.brandl', 'jvr', 'mzimmerman', 'scott_daniels', 'myers_carpenter', 'alanmcintyre', 'crhode']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue467924'
    versions = []

    @nobody
    Copy link
    Mannequin

    nobody mannequin commented Oct 4, 2001

    There exist two methods to write to a ZipFile

         write(self, filename, arcname=None, compress_type=None)  
         writestr(self, zinfo, bytes)

    but only one to read from it

         read(self, name)

    Additionally, the two 'write's behave differently with respect to compression.

    ---
    (a) 'read' does not fit to 'write', since 'write' takes a file and adds it to a ZipFile,
    but 'read' is not the reverse operation. 'read' should be called 'readstr' since it
    much better matches to 'writestr'.

    (b) It is confusing what 'write' and 'read' actually mean. Does 'write' write a file,
    or into the ZipFile? It would be more obvious if ZipFile has 4 methods which
    pair-wise fit together:

     writestr (self, zinfo, bytes)
          # same as now
     readstr (self, name)
          # returns bytes (as string), currently called 'read'
          # 'read' could still live but should be deprecated
     add (self, filename, arcname=None, compress_type=None)
          # currently 'write'
          # 'write' could still live but should be deprecated
     extract (self, name, filename, arcname=None)
          # new, desired functionality
    

    (c) BOTH, 'writestr' and 'add' should by default use the 'compress_type' that was
    passed to the constructor of 'ZipFile'. Currently, 'write' does it, 'writestr' via
    zinfo does it not. 'ZipInfo' sets the compression strict to 'ZIP_STORED' :-(
    It should not do that! It rather should:
    - allow more parameters in the signature of the constructor
    to also pass the compression type (and some other attributes, too)
    - default to 'None', so that 'writestr' can see this, and then take
    the default from the 'ZipFile' instance.

    @anonymous anonymous mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 4, 2001
    @jvr
    Copy link
    Mannequin

    jvr mannequin commented Jan 5, 2003

    Logged In: YES
    user_id=92689

    In Python 2.3, writestr() has an enhanced signature: the
    first arg may now also be an archive name, in which case the
    correct default settings are used (ie. the compression value
    is taken from the file). See patch bpo-651621.

    extract() could be moderately useful (although I don't
    understand the 'arcname' arg, how's that different from
    'name'?) but would have to deal with file modes (bin/text).
    The file mode isn't in the archive so would have to
    (optionally) be supplied by the caller.

    @mzimmerman
    Copy link
    Mannequin

    mzimmerman mannequin commented Jul 31, 2003

    Logged In: YES
    user_id=196786

    It would also be very useful to be able to have ZipFile
    read/write the uncompressed file data from/to a file-like
    object, instead of just strings and files (respectively).

    I would like to use this module to work with zip files
    containing large files, but this is unworkable because the
    current implementation would use excessive amounts of memory.

    Currently, read() reads all of the compressed data into
    memory, then uncompresses it into memory. For files which
    may be hundreds of megabytes compressed, this is undesirable.

    Likewise for write(), I would like to be able to stream data
    into a zip file, passing in a ZipInfo to specify the
    metadata as is done with writestr().

    The implementation of this functionality is quite
    straightforward, but I am not sure whether (or how) the
    interface should change. Some other parts of the library
    allow for a file object to be passed to the same interface
    which accepts a filename. The object is examined to see if
    it has the necessary read/write methods and if not, it is
    assumed to be a filename. Would this be the correct way to
    do it?

    I, too, am a bit irked by the lack of symmetry exhibited by
    read vs. write/writestr, and think that the interface
    suggested above would be a significant improvement.

    @myerscarpenter
    Copy link
    Mannequin

    myerscarpenter mannequin commented May 9, 2004

    Logged In: YES
    user_id=335935

    The zipfile interface should match the tarfile interface.
    At the mininum is should work for this example:

    import zipfile
    zip = zipfile.open("sample.zip", "r")
    for zipinfo in zip:
        print tarinfo.name, "is", tarinfo.size, "bytes in size
    and is",
        zip.extract(zipinfo)
    zip.close()

    This closely matchs the 'tarfile' module.

    @crhode
    Copy link
    Mannequin

    crhode mannequin commented Sep 22, 2005

    Logged In: YES
    user_id=988879

    I've been trying to read map files put out by the Census
    Bureau. These ZIP archives are downloaded from government
    contractors' sites by county. Within each county archive
    are several ZIP files for each map layer (roads, streams,
    waterbodies, etc). Each contains the elements of an ESRI
    shapefile database (.shp, .shx., and .dbf files). This
    doesn't make a lot of sense to me, either, because there's
    no compression advantage to making an archive of an archive.
    The technique is used purely for organizational purposes
    because ZIP does not compress subdirectories.

    Note: I've never seen a TAR of TAR files because TAR *does*
    compress subdirectories.

    What I've been struggling with is a way to leave these
    archives in their compressed form and still do *python* I/O
    on them. There is a tree organization to them, after all,
    just as with traditional os.path directories. I've designed
    some objects that let me retrieve the most recent file, ZIP
    member, or TAR member by name from a given path to a
    repository of such archives. What I get is a StreamIO
    object that I can subsequently put back where it came from.

    What would be nice is if there already were objects
    available to manipulate normal os.path directories comingled
    with ZIP and TAR archives. What would be nicer is if I/O
    could be opened at the character/line level transparently
    without regard to whether the source/destination was a file
    or an archive member within such a structure. In the days
    of hardware compression and on-the-fly encryption/decryption
    of I/O, is this too much to ask? -ccr-

    @scottdaniels
    Copy link
    Mannequin

    scottdaniels mannequin commented Sep 25, 2005

    Logged In: YES
    user_id=493818

    I am currently working on an expanded zipfile module that:
    (a) Has a more easily extensible class
    (b) Allows BZIP2 compression (my orginal need)
    (c) Allows file-like (read) access to the elements of ZipFile
    (d) Provides for a single "writer" which can be used to
    generate file contents "incrementally" while possibly
    reading from other "files" in the zipfile
    (e) Allows the opening of embedded zips "in-place"

    What I don't have at the moment is a good set of tests
    or good documents of how to use it. Anyone interested
    in collaborating, let me know.

    --Scott David Daniels

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 25, 2005

    Logged In: YES
    user_id=1115903

    Scott,

    I had put together some enhancements to ZipFile read/write,
    including test cases, but haven't had time to advocate
    getting it into 2.5. You can find it here:

    https://sourceforge.net/tracker/?func=detail&aid=1121142&group_id=5470&atid=305470

    If it seems like it would be helpful, I can go round up the
    most recent version (that I've been using in a production
    environment) and send it to you.

    @birkenfeld
    Copy link
    Member

    Alan's patch has since been committed. Is there any more work on this item?

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Nov 30, 2007

    There was another issue that also asked for an extract feature, and if I
    recall correctly I said I'd try to work on it (I think I have some code
    somewhere for it but I'll have to look). Tonight or tomorrow I will see
    if I can find that other issue and let you know about it, and maybe take
    a look around at the various zipfile improvement/change requests to see
    if they've been completely addressed.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Jan 5, 2008

    I attached a patch with the following changes (as zipfile_extract.diff):

    (1) Add a note to the docs (under writestr) about how the compression is
    selected if a ZipInfo is passed as the zinfo_or_arcname parameter. If
    anybody thinks it's a good idea to add a compression argument to the
    ZipInfo constructor, I can modify the patch/docs accordingly.

    (2) Add an extract method to ZipFile and associated test/documentation
    changes.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Jan 5, 2008

    Are the method renames/additions suggested in the original issue worth
    doing? When I first started using this module, I found the
    documentation easy and thorough enough to understand how to use it, so I
    would vote for just leaving the ZipFile interface the way it is.

    @birkenfeld
    Copy link
    Member

    I committed your patch (after reviewing the docs) as r59834. I think
    there is no more to do here.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant