Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipFile.open - read-only file-like obj for files in archive #41565

Closed
alanmcintyre mannequin opened this issue Feb 12, 2005 · 16 comments
Closed

ZipFile.open - read-only file-like obj for files in archive #41565

alanmcintyre mannequin opened this issue Feb 12, 2005 · 16 comments
Labels
stdlib Python modules in the Lib dir

Comments

@alanmcintyre
Copy link
Mannequin

alanmcintyre mannequin commented Feb 12, 2005

BPO 1121142
Nosy @loewis
Files
  • zipfile_patch2.tgz: Upated patch files
  • zipfile_patch3.tgz: Revision 3 of patch
  • zipfile_patch4.tgz: Revision 4 of patch
  • zipfile_patch5.tgz: Revision 5 of patch
  • zipfile_patch6.tgz: Revision 6 of patch for 2.6 trunk
  • zipfile_patch7.tgz: Revision 7 (hopefully final)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2007-03-06.14:29:21.000>
    created_at = <Date 2005-02-12.00:08:30.000>
    labels = ['library']
    title = 'ZipFile.open - read-only file-like obj for files in archive'
    updated_at = <Date 2007-03-06.14:29:21.000>
    user = 'https://bugs.python.org/alanmcintyre'

    bugs.python.org fields:

    activity = <Date 2007-03-06.14:29:21.000>
    actor = 'alanmcintyre'
    assignee = 'none'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2005-02-12.00:08:30.000>
    creator = 'alanmcintyre'
    dependencies = []
    files = ['6492', '6493', '6494', '6495', '6496', '6497']
    hgrepos = []
    issue_num = 1121142
    keywords = ['patch']
    message_count = 16.0
    messages = ['47752', '47753', '47754', '47755', '47756', '47757', '47758', '47759', '47760', '47761', '47762', '47763', '47764', '47765', '47766', '47767']
    nosy_count = 3.0
    nosy_names = ['loewis', 'glyf', 'alanmcintyre']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1121142'
    versions = ['Python 2.5']

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 12, 2005

    I originally started working on updating patch 992750,
    but decided after a short while to just start from
    scratch, so I'm posting it as a new patch. Sorry if
    this isn't appropriate.

    This patch provides a new open() method on ZipFile;
    this method returns a file-like object for the
    requested item in the archive. This file-like object
    only provides a read() method.

    ZipFile.read was modified to use the new open method
    (this was suggested by loewis in reference to patch
    992750).

    The patched zipfile.py passed the existing tests in the
    test_zipfile.py from CVS. New tests were added to
    verify the operation of the object returned by open().
    These tests were modeled after existing tests for
    ZipFile.read(); two read fixed-size chunks from the
    file-like object, and two others read random-sized chunks.

    I have only run the tests on Windows XP, using
    Python2.4 from the official Windows installer. I will
    test the patch out on Linux over the weekend.

    If the patch is accepted I'll also generate and submit
    patches for the appropriate documentation as well.

    @alanmcintyre alanmcintyre mannequin closed this as completed Feb 12, 2005
    @alanmcintyre alanmcintyre mannequin added the stdlib Python modules in the Lib dir label Feb 12, 2005
    @alanmcintyre alanmcintyre mannequin closed this as completed Feb 12, 2005
    @alanmcintyre alanmcintyre mannequin added the stdlib Python modules in the Lib dir label Feb 12, 2005
    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 27, 2005

    Logged In: YES
    user_id=1115903

    zipfile_patch2.tgz: I updated the file-like object to
    support readline, readlines, __iter__ and next(). Added
    tests for those new methods. Also added a patch for the
    documentation.

    Passes regression tests on 2.5a0 built from CVS HEAD with
    MSVC .NET on Windows XP.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 1, 2005

    Logged In: YES
    user_id=21627

    The patch is reversed: usually, diff is invoked as "-c old new".

    I think it is almost right, but I have a few remarks:

    • by tradition, open() should have a mode argument,
      defaulting to 'r'; it would be ok to raise exceptions if it
      is anything else. However, do consider implementing
      universal newlines; allowing 'b' as a no-op might also be
      reasonable.
    • I wonder what happens if the compression rate is < 1. It
      would appear that the code might use too few rawbytes. I
      suggest to recursively invoke read in this case.
    • I wonder whether it could ever happen that there is still
      data to uncompress in the zlib object, ie. whether it might
      be necessary to invoke .flush() after exhausting the
      rawbytes (and discard the zlib object afterwards)
    • it appears that atleast the builtin file object implements
      .read(0) as returning an empty string; the manual says that
      the entire file is meant only if size is omitted or negative.

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Mar 14, 2005

    Logged In: YES
    user_id=1115903

    Hmm...I could have sworn I did the diff in the correct
    order. I'll make sure next time. :)

    Here's my comments on your remarks (in order):

    • I'm adding support for universal newlines, and will reject
      all modes that aren't legal combinations of r, U, and b.
    • I'll see if I can make a Zip file store something with
      compression < 1, and if I can I'll add a test case for it.
    • I'll try work a .flush() in there on the compression
      object and come up with a test case if possible
    • .read(0) and .readline(0) will both return an empty string
      with no side-effects, since this seems to be what builtin
      files do.

    Right now ZipExtFile uses the ZipFile's file object, so you
    pretty much have to do whatever you're going to do with the
    ZipExtFile instance you get back from .open() before you use
    that ZipFile instance in a way that moves the file pointer
    around.

    I'm sure that somebody will eventually try to use the
    ZipFile in this way, so it appears my options are either to
    (1) give the ZipExtFile its own file object to use (when
    possible), or (2) make sure this limitation is documented.
    #1 feels (to me) to be the "right thing" to do, so that's
    what I'll try unless there's a good reason I shouldn't.

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Apr 13, 2005

    Logged In: YES
    user_id=1115903

    Uploaded the third revision of this patch; passes all
    regression tests against current CVS on WinXP. I think all
    the issues Martin brought up have been addressed except
    perhaps for the case of compression rate <1. I will have a
    look at that when I have time; just wanted to get an update
    here before the patch started to look abandoned. :)

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Apr 13, 2005

    Logged In: YES
    user_id=1115903

    I found a problem with my universal newline handling code in
    readline(); if the first byte of an '\r\n' pair was read
    from file but the second byte didn't come in on that same
    read, it resulted in the next line having an inappropriate
    '\n' prepended to it. zipfile_patch4.tgz has a fix for this
    included (with everything else, of course).

    I'm going to test the open() capability in a real
    application with some reasonably large zip files (containing
    files up to just short of 2GB) over the next few days, so if
    any bugs or performance problems show up I may have some
    more changes.

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Apr 27, 2005

    Logged In: YES
    user_id=1115903

    After testing on my large batch of large Zip files, I made
    one fix (version 4 of the patch didn't read all the content
    of large compressed archive items). The current set of
    changes is attached as zipfile_patch5.tgz.

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented May 31, 2005

    Logged In: YES
    user_id=1115903

    Revision 5 of this patch has been in constant use with
    Python 2.4.1 in an application at my job for about a month;
    it seems to be stable and useful in that regard. If anybody
    has time to review and accept (or reject) it I would
    appreciate it.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 13, 2007

    Can you please update the patch to the current subversion trunk? I'd like to apply it, but the code of zipfile has changed so that the patch is currently out-of-date. When redoing it, notice that the read implementation has changed (I couldn't figure out how you moved code around).

    Please do use the trunk (not Python 2.5), and please submit the patch as a single 'svn diff' output (rather than a tar file containing multiple individual diff files).

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 14, 2007

    I will see if I can make some progress on this over the weekend.

    @glyf
    Copy link
    Mannequin

    glyf mannequin commented Feb 17, 2007

    Twisted also contains an implementation of this functionality, available from

    http://twistedmatrix.com/trac/browser/trunk/twisted/python/zipstream.py
    

    As far as I can tell it doesn't have anything to recommend it over the attached patch, however (in fact, the test coverage of the attached patch looks better), but perhaps it has some behavior which might be desirable to steal.

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 18, 2007

    Thanks, glyph; I'll definitely have a look and see what I can steal. :-)

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 18, 2007

    Here is a version of the patch against the current trunk. It passes a "make test" on my Gentoo laptop at the moment, but I think I still need to add some more tests (like reading from encrypted compressed files), and I need to try out the test_zipfile64 stuff. I don't have time today to look through the module Glyph suggested, so it may be that after I do that I'll have some more tweaks to make. I'm pretty sure I'll be able to spend some more time on Wednesday this week.

    I also included the full contents of zipfile.py and test_zipfile.py just in case that's useful to somebody.
    File Added: zipfile_patch6.tgz

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Feb 28, 2007

    This version of the patch includes the updates to the documentation. It passes regression tests as well as the test_zipfile64 tests on my Gentoo laptop.
    File Added: zipfile_patch7.tgz

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 6, 2007

    Thanks for the patch. Committed as r54152. If you want to produce further changes, please submit them as new patches against the subversion trunk.

    Two remarks:

    • I recall discussion of adding an extract method. Are you still interested in this?
    • The previous version required that the mode is 'r' or 'a' to invoke .read(), and raised a RuntimeError in this case. Your version doesn't (but instead raises KeyError if the archive member to be read wasn't written yet). Was this change deliberate?

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Mar 6, 2007

    Thanks for committing. :) I will probably try adding the extract method and submit a patch for that soon. I had noticed the mode problem just a day or two ago while I was writing some more tests to improve coverage of the zipfile module, and I will include a fix for that (and the new tests) in a separate patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants