Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate the zlib decompressor’s flush() method #67389

Open
vadmium opened this issue Jan 9, 2015 · 9 comments
Open

Deprecate the zlib decompressor’s flush() method #67389

vadmium opened this issue Jan 9, 2015 · 9 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@vadmium
Copy link
Member

vadmium commented Jan 9, 2015

BPO 23200
Nosy @vadmium, @serhiy-storchaka
Files
  • max_length.patch
  • deprecate-flush.patch
  • deprecate-flush.v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-01-09.01:26:21.340>
    labels = ['type-feature', 'library']
    title = 'Deprecate the zlib decompressor\xe2\x80\x99s flush() method'
    updated_at = <Date 2016-02-02.11:33:17.625>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2016-02-02.11:33:17.625>
    actor = 'martin.panter'
    assignee = 'nadeem.vawda'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-01-09.01:26:21.340>
    creator = 'martin.panter'
    dependencies = []
    files = ['37650', '41067', '41780']
    hgrepos = []
    issue_num = 23200
    keywords = ['patch']
    message_count = 9.0
    messages = ['233709', '233710', '254828', '254836', '254837', '254841', '255564', '256142', '259380']
    nosy_count = 5.0
    nosy_names = ['nadeem.vawda', 'docs@python', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23200'
    versions = ['Python 3.6']

    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 9, 2015

    This simple patch documents that max_length has to be non-zero. The implementation actually uses zero as a special value to indicate max_length was not specified.

    Also, I wonder what the point of the Decompressor.flush() method is. Reading the module code and zlib manual, it looks like it would return the same data as decompressor.decompress(decompressor.unconsumed_tail), except that it uses the Z_FINISH flag instead of Z_SYNC_FLUSH, so that zlib can optimize by assuming no more data is to be processed. Since unconsumed_tail is read-only and only relevant when using max_length, and flush does not support max_length, I wonder if the flush() method should be deprecated.

    To further test this theory, I modified the test_zlib.py file so that all appropriate flush() calls are equivalent to this:

    self.assertFalse(dco.flush())

    and the tests all still pass.

    BTW, it looks to me like zlib_Decompress_flush_impl() is setting avail_out too high (blessing unallocated memory) when the total length is more than half of UNIT_MAX, but I am not in a position to test this.

    @vadmium vadmium added the docs Documentation in the Doc dir label Jan 9, 2015
    @vadmium
    Copy link
    Member Author

    vadmium commented Jan 9, 2015

    The processing of unconsumed_tail in flush() was introduced via bpo-16411. Before that I suspect flush() was assumed to only be called when max_length was not used.

    The decompress() method changed from Z_NO_FLUSH to Z_SYNC_FLUSH in Feb 2001; see revision 01c2470eeb2e. I guess previously flush() was necessary to get all your data. The max_length parameter was added later in October that year; see bpo-403753 and revision 60e12f83d2d2.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 18, 2015

    New changeset 106c49edbb12 by Martin Panter in branch '2.7':
    Issue bpo-23200: Document that max_length=0 is not supported
    https://hg.python.org/cpython/rev/106c49edbb12

    New changeset 660bdfaada64 by Martin Panter in branch '3.4':
    Issue bpo-23200: Document that max_length=0 is not supported
    https://hg.python.org/cpython/rev/660bdfaada64

    New changeset d64bb25cf797 by Martin Panter in branch '3.5':
    Issue bpo-23200: Merge zlib doc from 3.4 into 3.5
    https://hg.python.org/cpython/rev/d64bb25cf797

    New changeset cdd403dd82cb by Martin Panter in branch 'default':
    Issue bpo-23200: Merge zlib doc from 3.5
    https://hg.python.org/cpython/rev/cdd403dd82cb

    @vadmium
    Copy link
    Member Author

    vadmium commented Nov 18, 2015

    See bpo-224981 (bug) and bpo-403373 (patch) about the Z_SYNC_FLUSH change (the numbers in the commit message are different and do not work).

    I committed my doc change to 2.7 and 3.4+, which leaves the main problem of what to do about flush(). I propose deprecate-flush.patch for 3.6, which deprecates this method.

    I ran the test suite with -Werror, so I hopefully got all the instances of flush() in the library.

    @vadmium vadmium added stdlib Python modules in the Lib dir and removed docs Documentation in the Doc dir labels Nov 18, 2015
    @vadmium vadmium changed the title Clarify max_length and flush() for zlib decompression Deprecate the zlib decompressor’s flush() method Nov 18, 2015
    @serhiy-storchaka
    Copy link
    Member

    Besides unconsumed_tail there is an internal buffer. Even if flush() is no longer needed in the zlib decompresser (I don't know), I doubt about bz2 and lzma.

    @vadmium
    Copy link
    Member Author

    vadmium commented Nov 18, 2015

    For the bz2 and lzma modules, neither decompressor classes have a flush() method, only the compressors.

    @vadmium
    Copy link
    Member Author

    vadmium commented Nov 29, 2015

    And regarding other internal buffers and state, calling flush() or decompress() guarantees you get as much data as possible, but there is no error checking. You have to check the eof attribute to ensure everything is done, otherwise I don’t think there is a way to differentiate a truncated stream from a properly ended stream.

    @serhiy-storchaka
    Copy link
    Member

    After reading zlib manual it looks to me that your are right, flush() has no other effect except a little less memory consumption. This can be not important in the context of Python. Hope Nadeem will confirm this.

    @serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Dec 9, 2015
    @vadmium
    Copy link
    Member Author

    vadmium commented Feb 2, 2016

    New version of the patch merged with current code and cleaning up the zipfile EOF detection.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests

    2 participants