Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip.GzipFile to accept stream as fileobj. #40026

Closed
belyi mannequin opened this issue Mar 11, 2004 · 8 comments
Closed

gzip.GzipFile to accept stream as fileobj. #40026

belyi mannequin opened this issue Mar 11, 2004 · 8 comments
Labels
extension-modules C modules in the Modules dir

Comments

@belyi
Copy link
Mannequin

belyi mannequin commented Mar 11, 2004

BPO 914340
Nosy @loewis, @birkenfeld
Files
  • gzip-stream.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2007-03-15.15:16:47.000>
    created_at = <Date 2004-03-11.18:45:17.000>
    labels = ['extension-modules']
    title = 'gzip.GzipFile to accept stream as fileobj.'
    updated_at = <Date 2007-03-15.15:16:47.000>
    user = 'https://bugs.python.org/belyi'

    bugs.python.org fields:

    activity = <Date 2007-03-15.15:16:47.000>
    actor = 'lucas_malor'
    assignee = 'none'
    closed = True
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2004-03-11.18:45:17.000>
    creator = 'belyi'
    dependencies = []
    files = ['5848']
    hgrepos = []
    issue_num = 914340
    keywords = ['patch']
    message_count = 8.0
    messages = ['45495', '45496', '45497', '45498', '45499', '45500', '45501', '45502']
    nosy_count = 5.0
    nosy_names = ['loewis', 'georg.brandl', 'belyi', 'lucas_malor', 'antialize']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue914340'
    versions = ['Python 2.4']

    @belyi
    Copy link
    Mannequin Author

    belyi mannequin commented Mar 11, 2004

    When gzip.GzipFile is initialized with a fileobj which
    does not have
    tell() and seek() methods (non-rewinding stream) it throws
    exception. The interesting thing is that it doesn't
    have to. The
    following patch updates gzip.py to allow any stream
    with just a
    read() method to be used. This is helpful if you want
    to be able to
    do something like:
    gzip.GzipFile(fileobj=urllib.urlopen("file:///README.gz")).readlines()
    or use GzipFile with sys.stdin stream.

    But keep in mind that seek() and rewind() methond of
    the GzipFile()
    won't for such stream even with the patch.

    Igor

    @belyi belyi mannequin closed this as completed Mar 11, 2004
    @belyi belyi mannequin added the extension-modules C modules in the Modules dir label Mar 11, 2004
    @belyi belyi mannequin closed this as completed Mar 11, 2004
    @belyi belyi mannequin added the extension-modules C modules in the Modules dir label Mar 11, 2004
    @belyi
    Copy link
    Mannequin Author

    belyi mannequin commented Mar 11, 2004

    Logged In: YES
    user_id=995711

    Previous revision of the patch does not work correctly with
    mutliple
    compressed members in one stream. I've updated the patch file.

    @belyi
    Copy link
    Mannequin Author

    belyi mannequin commented Mar 19, 2004

    Logged In: YES
    user_id=995711

    I thought I need to add a little bit more verbose
    explanation for
    the changes...

    Current implementation of GzipFile() uses tell() and seek()
    to scroll stream of data in the following 2 cases:

    1. When EOF is reached and the last 8 bytes of the file
      contain checksum and uncompress data size
    2. When after decompression there's left some 'unused_data'
      meaning that a stream may contains more than one compressed
      item.

    What my change does it introduces 2 helper buffers:
    'inputbuf' which keeps read but unused data from the stream and
    'last8' which keeps last 8 'used' bytes

    Plus, my change introduces helper method _read_internal()
    which is used instead of the direct call to
    self.fileobj.read(). In this method data from the stream are
    read as needed with the call to self.fileobj.read() and
    correct values of 'inputbuf' and ''last8' are maintained.

    When case 1 above happen we use 'last8' buffer to read
    checksum and size.
    When case 2 above happen we add value of the 'unused_data'
    to inputbuf.

    There's one more instance of the self.fileobj.seek() call
    left in rewind() method but it is used only when rewind() or
    seek() methods of GzipFile class are used. And it won't be
    logical to expect those methods to work if the underlying
    fileobj does not support them.

    Igor

    1 similar comment
    @belyi
    Copy link
    Mannequin Author

    belyi mannequin commented Mar 19, 2004

    Logged In: YES
    user_id=995711

    I thought I need to add a little bit more verbose
    explanation for
    the changes...

    Current implementation of GzipFile() uses tell() and seek()
    to scroll stream of data in the following 2 cases:

    1. When EOF is reached and the last 8 bytes of the file
      contain checksum and uncompress data size
    2. When after decompression there's left some 'unused_data'
      meaning that a stream may contains more than one compressed
      item.

    What my change does it introduces 2 helper buffers:
    'inputbuf' which keeps read but unused data from the stream and
    'last8' which keeps last 8 'used' bytes

    Plus, my change introduces helper method _read_internal()
    which is used instead of the direct call to
    self.fileobj.read(). In this method data from the stream are
    read as needed with the call to self.fileobj.read() and
    correct values of 'inputbuf' and ''last8' are maintained.

    When case 1 above happen we use 'last8' buffer to read
    checksum and size.
    When case 2 above happen we add value of the 'unused_data'
    to inputbuf.

    There's one more instance of the self.fileobj.seek() call
    left in rewind() method but it is used only when rewind() or
    seek() methods of GzipFile class are used. And it won't be
    logical to expect those methods to work if the underlying
    fileobj does not support them.

    Igor

    @antialize
    Copy link
    Mannequin

    antialize mannequin commented Jun 19, 2006

    Logged In: YES
    user_id=379876

    Is there any reson this patch is not accepted? If this patch
    is accepted then I have a patch to urlib2 to (automaticaly)
    accept gzipped content as described here
    http://www.http-compression.com/#client_request, if there is
    some reson this patch is not acceptable please detail, so it
    can be fixed, in tired of using popen and gunzip :)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 6, 2007

    The patch in this form is incomplete: it lacks test suite changes. Can somebody please provide patches to Lib/test/test_gzip.py that exercises this new functionality?

    @birkenfeld
    Copy link
    Member

    It looks like Patch bpo-1675951 provides the same feature, plus speedups.

    @lucasmalor
    Copy link
    Mannequin

    lucasmalor mannequin commented Mar 15, 2007

    There's a problem with this path. If previously in my code I read some bytes of the the GzipFile object, _read_gzip_header returns IOError, 'Not a gzipped file', because it starts to read at the current position, not at the start. Unluckily seek() could not be used for urllib objects. I don't see any possible workaround.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant