Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib can't decompress DEFLATE using shared dictionary #71351

Closed
VladimirMihailenco mannequin opened this issue May 31, 2016 · 9 comments
Closed

zlib can't decompress DEFLATE using shared dictionary #71351

VladimirMihailenco mannequin opened this issue May 31, 2016 · 9 comments
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@VladimirMihailenco
Copy link
Mannequin

VladimirMihailenco mannequin commented May 31, 2016

BPO 27164
Nosy @Yhg1s, @gpshead, @vadmium, @zhangyangyu
Files
  • issue27164.patch
  • issue27164_with_test.patch
  • raw-zdict.v3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-06-06.01:48:25.514>
    created_at = <Date 2016-05-31.09:41:03.726>
    labels = ['extension-modules', 'type-bug']
    title = "zlib can't decompress DEFLATE using shared dictionary"
    updated_at = <Date 2016-06-06.01:48:25.513>
    user = 'https://bugs.python.org/VladimirMihailenco'

    bugs.python.org fields:

    activity = <Date 2016-06-06.01:48:25.513>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-06-06.01:48:25.514>
    closer = 'martin.panter'
    components = ['Extension Modules']
    creation = <Date 2016-05-31.09:41:03.726>
    creator = 'Vladimir Mihailenco'
    dependencies = []
    files = ['43082', '43085', '43230']
    hgrepos = []
    issue_num = 27164
    keywords = ['patch']
    message_count = 9.0
    messages = ['266747', '266804', '266814', '266819', '266871', '266876', '267391', '267405', '267416']
    nosy_count = 7.0
    nosy_names = ['twouters', 'gregory.p.smith', 'nadeem.vawda', 'python-dev', 'martin.panter', 'xiang.zhang', 'Vladimir Mihailenco']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27164'
    versions = ['Python 3.5', 'Python 3.6']

    @VladimirMihailenco
    Copy link
    Mannequin Author

    VladimirMihailenco mannequin commented May 31, 2016

    Consider following code:

    import zlib
    
    
    hello = b'hello'
    
    # Compress data using deflate and shared/predefined dict.
    co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
    data = co.compress(hello) + co.flush()
    
    # Decompress deflate by providing same dict.
    do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
    data = do.decompress(data)
    
    print(data)
    

    Decompression fails with "zlib.error: Error -3 while decompressing data: invalid distance too far back".

    My original task was to decompress data I get from Golang library - https://golang.org/pkg/compress/flate/#NewWriterDict

    @VladimirMihailenco VladimirMihailenco mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 31, 2016
    @zhangyangyu
    Copy link
    Member

    This seems related with the difference between zlib format and raw format. When you do raw inflate, you have to inflateSetDictionary before any inflate. You cannot rely on the first inflate to return Z_NEED_DICT and then do inflateSetDictionary.

    Although I have the solution with experiment, but there is nothing can be found in the official zlib documentation. It is rather vague. I'll make more research and submit a patch later..

    Right now, you can use zlib format (positive wbits) if it can achieve your goal. It's behaviour is right.

    @zhangyangyu
    Copy link
    Member

    I think my conclusion in last message is right. A similar bug in CPAN can be found in https://rt.cpan.org/Public/Bug/Display.html?id=36046. And I can find in zlib's changelog inflateSetDictionary starts to support raw inflate from 1.2.2.1.

    Patch attached now allows Python to decompress raw deflate format with a dictionary. Hope some core devs are willing to review.

    @zhangyangyu
    Copy link
    Member

    Forget to add test. Upload a new one with test included.

    @gpshead gpshead self-assigned this Jun 1, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Jun 2, 2016

    Thanks for the patches. Maybe Gregory knows more about this than I do (I never used zdict). But I think I have figured it out today, so I left some review comments :)

    I’m not sure whether this really is a bug fix, although it would have little if any impact on existing code. It seems zdict was implemented with just the non-raw zlib format in mind, so supporting raw deflate mode is a new feature. If we were to add full support of inflateSetDictionary() for raw deflate mode, I imagine it would be a new decompressobj.set_dictionary() method.

    I did some quick tests with specifying an unnecessary zdict buffer to the decompressor (when zdict not used for compression), and it doesn’t _seem_ to break anything, but I am not 100% sure.

    Notes from my learning what zdict means:

    Zdict seems to be a bunch of user data to setup or train the (de)compressor with. It was added by bpo-14684. (Not a Python dictionary, nor any special zlib dictionary format. The Python documentation is confusing, and the Go documentation made more sense, assuming it is equivalent.)

    See the documentation of in/deflateSetDictionary() in <http://www.zlib.net/manual.html\> or zlib.h. Looking through the history at <https://github.com/madler/zlib\> can also be useful.

    @zhangyangyu
    Copy link
    Member

    Thanks for your review, Martin. :) I have replied with after thinking and try.

    Yes, this is more like a feature request. But for the end user, this behaviour is quite like a bug.

    If we were to add full support of inflateSetDictionary() for raw deflate mode, I imagine it would be a new decompressobj.set_dictionary() method.

    I don't support adding a new set_dictionary() method. Right now decompressobj handles raw deflate format, zlib format and gzip format. Adding a set_dictionary() method for raw deflate format seems weird. If we are going to do so, I suggest reorganize the zlib like PHP and nodejs do, separating different formats into different objects.

    I did some quick tests with specifying an unnecessary zdict buffer to the decompressor (when zdict not used for compression), and it doesn’t _seem_ to break anything, but I am not 100% sure.

    This patch only affects raw inflate. You have to set wbits negative. If you are decompressing with an unnecessary zdict, I think it ought to emit an error.

    With Python background, zlib dictionary is really hard to understand. I think manual doesn't tell much. I understand why zdict is needed until reading https://blog.cloudflare.com/improving-compression-with-preset-deflate-dictionary/.

    @gpshead gpshead removed their assignment Jun 2, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Jun 5, 2016

    Here is a patch which I propose to use that factors out a set_inflate_zdict() function. I think this will help with maintainence, e.g. if someone changes the UINT_MAX checking, but forgets that there are two places than need changing. Let me know if that is okay with you :)

    @vadmium vadmium added extension-modules C modules in the Modules dir and removed stdlib Python modules in the Lib dir labels Jun 5, 2016
    @zhangyangyu
    Copy link
    Member

    It looks good. I don't think that is a matter since usually I will do a search before change. But of course extracting them to a subroutine helps.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 5, 2016

    New changeset d91b951e676f by Martin Panter in branch '3.5':
    Issue bpo-27164: Allow decompressing raw Deflate streams with predefined zdict
    https://hg.python.org/cpython/rev/d91b951e676f

    New changeset 470954641f3b by Martin Panter in branch 'default':
    Issue bpo-27164: Merge raw Deflate zdict support from 3.5
    https://hg.python.org/cpython/rev/470954641f3b

    @vadmium vadmium closed this as completed Jun 6, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants