Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib set dictionary support inflateSetDictionary #58889

Closed
SamRushing mannequin opened this issue Apr 27, 2012 · 23 comments
Closed

zlib set dictionary support inflateSetDictionary #58889

SamRushing mannequin opened this issue Apr 27, 2012 · 23 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@SamRushing
Copy link
Mannequin

SamRushing mannequin commented Apr 27, 2012

BPO 14684
Nosy @jcea, @merwok, @JimJJewett
Files
  • zlib_set_dictionary.patch: adds inflateSetDictionary(), deflateSetDictionary()
  • zlib_set_dictionary_2.patch: set dictionary in constructors
  • tz2.py: verify correct stream behavior with dictionary
  • zlib_set_dictionary_3.patch
  • zlib_set_dictionary_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-06-21.00:27:01.720>
    created_at = <Date 2012-04-27.20:49:09.366>
    labels = ['extension-modules', 'type-feature']
    title = 'zlib set dictionary support inflateSetDictionary'
    updated_at = <Date 2012-06-21.23:51:41.840>
    user = 'https://bugs.python.org/SamRushing'

    bugs.python.org fields:

    activity = <Date 2012-06-21.23:51:41.840>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-06-21.00:27:01.720>
    closer = 'nadeem.vawda'
    components = ['Extension Modules']
    creation = <Date 2012-04-27.20:49:09.366>
    creator = 'Sam.Rushing'
    dependencies = []
    files = ['25387', '25446', '25447', '25449', '25472']
    hgrepos = []
    issue_num = 14684
    keywords = ['patch', 'needs review']
    message_count = 23.0
    messages = ['159492', '159870', '159886', '159887', '159888', '159889', '159890', '159893', '159897', '160004', '160028', '161082', '161093', '163118', '163130', '163147', '163199', '163228', '163298', '163300', '163339', '163341', '163380']
    nosy_count = 6.0
    nosy_names = ['jcea', 'nadeem.vawda', 'eric.araujo', 'python-dev', 'Jim.Jewett', 'Sam.Rushing']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14684'
    versions = ['Python 3.3']

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented Apr 27, 2012

    Google's SPDY protocol requires the use of a pre-defined compression dictionary. The current zlib module doesn't expose the two functions for setting the dictionary.

    This patch is minimal in the sense that it only exposes the two functions, but unfortunately the sequence of zlib calls required is clumsy: a call to inflate() must fail first (with an error of Z_NEED_DICT):

    import zlib
    zdict = b"thequickbrownfoxjumped\x00"
    c = zlib.compressobj()
    c.set_dictionary (zdict)
    cd = c.compress (b"the quick brown fox jumped over the candlestick")
    cd += c.flush()
    d = zlib.decompressobj()
    try:
        print (d.decompress (cd))
    except zlib.error as what:
        if what.args[0].startswith ('Error 2 '):
            d.set_dictionary (zdict)
            print (d.flush())

    Obviously a better way to catch/match Z_NEED_DICT would be nice.
    A much nicer solution would allow you to associate the dictionary at creation time, rather than having to wait for an error condition. I can only guess that the zlib authors designed it that way for a reason?

    @SamRushing SamRushing mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Apr 27, 2012
    @jcea
    Copy link
    Member

    jcea commented May 3, 2012

    A dictionary could be provided an init time. Then, the Z_NEED_DICT could be intercepted in the binding and automatically inject the dictionary provided in the init.

    Anyway, for a patch to be approved, we need a test too.

    PS: Why is this NOT targeted to 3.3?. We have time, yet.

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 3, 2012

    I'm currently reworking this so that the dictionaries are provided in the constructor, and inflateSetDictionary() is called automatically. I've gone over the zlib RFC's and zlibmodule.c, and I'm fairly certain that whatever usage mode might involve multiple calls to SetDictionary() couldn't be supported by the zlib object anyway.

    r.e. 3.3/3.4 - I merely chose the highest version number I saw, since this diff is against HEAD (as recommended by the docs). It's been quite a few years since I've submitted a patch to CPython!

    @jcea
    Copy link
    Member

    jcea commented May 3, 2012

    Retargetting to python 3.3.

    If you hurry a bit and I find your patch acceptable (remember the tests!), I will try to integrate it.

    @jcea jcea assigned serwy and unassigned serwy May 3, 2012
    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 3, 2012

    Ok, here's the patch. It has a single short test. For use with SPDY, it's necessary to test that the following stream data also correctly decompresses, I'll attach that to the next comment.

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 3, 2012

    This test is rather large, since it includes the predefined SPDY draft 2 dictionary, and some real-world data. Not sure what the policy is on including so much data in a test. If there's enough time I could make a smaller test that also verifies the correct behavior on a stream...

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 4, 2012

    Argh, probably need to add the 'dict' field to the copy() method.

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 4, 2012

    Updated version of the patch: extends the test, including a test of the streaming behavior needed for SPDY (both compression and decompression).

    Also wik: copy()/uncopy() are aware of the 'dict' attribute.

    @merwok
    Copy link
    Member

    merwok commented May 4, 2012

    Added a few comments on Rietveld.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented May 5, 2012

    I've posted a review on Rietveld.

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented May 5, 2012

    renames dict->zdict, splits the test, adds BEGIN/END around inflate call.

    @jcea
    Copy link
    Member

    jcea commented May 18, 2012

    Status of this feature?. Ready to integrate?

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented May 19, 2012

    The code should be changed to use the buffer API (instead of accepting
    only bytes objects). Other than that, I think it's ready for integration.

    @jcea
    Copy link
    Member

    jcea commented Jun 18, 2012

    Sam, the window for Python 3.3 integration is almost close. Could you possibly update your patch with Nadeem's feedback?.

    @SamRushing
    Copy link
    Mannequin Author

    SamRushing mannequin commented Jun 19, 2012

    I think other than the disagreement about whether the dictionary constructor arg should be a buffer object, it's good to go.
    To restate my position: the need is for an immutable string of bytes, and that's exactly what PyBytes_Type is for. I see no advantage to allowing a buffer object, which will require extra code to check that it is both a buffer object and set to be readonly. I believe the argument for aesthetics does not apply, as the constant dictionary constructor argument is a morally different kind of parameter, comparable to (say) the compression level.

    You folks are of course welcome to change it, though. 8^)

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jun 19, 2012

    To restate my position: the need is for an immutable string of bytes, [...]

    I disagree that we should require the dictionary to be immutable - if the
    caller wishes to use a mutable buffer here, it is their responsibility to
    ensure that it is not modified until the compressor is finished with it
    ("consenting adults" and all that). The documentation can inform users of
    this requirement.

    I believe the argument for aesthetics does not apply, as the constant
    dictionary constructor argument is a morally different kind of
    parameter, comparable to (say) the compression level.

    Even so, the surrounding code sets a precedent for how it accepts binary
    data buffers, and deviating from this existing convention should not be
    taken lightly.

    Nitpicking about the API aside, thanks for the patch :-)

    @jcea
    Copy link
    Member

    jcea commented Jun 19, 2012

    So my question is easy: could we apply this patch as is and defer any "improvement" to 3.4?. The risk of not doing so would be to miss 3.3 completely.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jun 19, 2012

    I plan to commit it (along with the buffer API changes) tomorrow.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 21, 2012

    New changeset dd4f7d5c51c7 by Nadeem Vawda in branch 'default':
    Issue bpo-14684: Add support for predefined compression dictionaries to the zlib module.
    http://hg.python.org/cpython/rev/dd4f7d5c51c7

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jun 21, 2012

    Committed. Once again, thanks for the patch!

    @nadeemvawda nadeemvawda mannequin closed this as completed Jun 21, 2012
    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Jun 21, 2012

    Just saw this on the checkins list; where are the other options documented?

    """
      PyDoc_STRVAR(compressobj__doc__,
    -"compressobj([level]) -- Return a compressor object.\n"
    +"compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
    +" -- Return a compressor object.\n"
     "\n"
    -"Optional arg level is the compression level, in 1-9.");
    +"Optional arg level is the compression level, in 1-9.\n"
    +"\n"
    +"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
    +"bytes containing subsequences that are likely to occur in the input data.");
    """

    I'm honestly not certain what they should be, but the following is my best guess:

    """
     PyDoc_STRVAR(compressobj__doc__,
     "compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
     " -- Return a compressor object.\n"
     "\n"
    -"Optional arg level is the compression level, in 1-9.\n"
    +"Optional arg level (1-9) is the compression level.\n"
    +"Larger numbers take longer, but produce smaller results.\n"
     "\n"
    +"Optional arg method is the compression method.\n"
    +"The only currently supported method is zlib.DEFLATED.\n"
    +"\n"
    +"Optional arg wbits determines the window buffer size.\n"
    +"Normal values are 8 (least memory) to 15 (best compression).\n"
    +"\n"
    +"Optional arg memlevel (1-9) controls working memory size.\n"
    +"Larger numbers use more memory, but produce smaller results more quickly.\n"
    +"\n"
    +"Optional arg strategy tunes the compression algorithm.\n"
    +"Supported options include zlib.Z_DEFAULT_STRATEGY, zlib.Z_FILTERED, and zlib.Z_HUFFMAN_ONLY.\n"
    +"\n"
    +"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
    +"bytes containing subsequences that are likely to occur in the input data.");
    """

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jun 21, 2012

    Just saw this on the checkins list; where are the other options documented?

    They aren't, AFAIK. I've been planning on adding them when I've got time
    (based on the zlib manual at http://zlib.net/manual.html), but with the
    upcoming feature freeze for 3.3, this issue was higher priority.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 21, 2012

    New changeset 1cfa44cb5af0 by Nadeem Vawda in branch 'default':
    Document the rest of zlib.compressobj()'s arguments.
    http://hg.python.org/cpython/rev/1cfa44cb5af0

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants