New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zlib set dictionary support inflateSetDictionary #58889
Comments
Google's SPDY protocol requires the use of a pre-defined compression dictionary. The current zlib module doesn't expose the two functions for setting the dictionary. This patch is minimal in the sense that it only exposes the two functions, but unfortunately the sequence of zlib calls required is clumsy: a call to inflate() must fail first (with an error of Z_NEED_DICT): import zlib
zdict = b"thequickbrownfoxjumped\x00"
c = zlib.compressobj()
c.set_dictionary (zdict)
cd = c.compress (b"the quick brown fox jumped over the candlestick")
cd += c.flush()
d = zlib.decompressobj()
try:
print (d.decompress (cd))
except zlib.error as what:
if what.args[0].startswith ('Error 2 '):
d.set_dictionary (zdict)
print (d.flush()) Obviously a better way to catch/match Z_NEED_DICT would be nice. |
A dictionary could be provided an init time. Then, the Z_NEED_DICT could be intercepted in the binding and automatically inject the dictionary provided in the init. Anyway, for a patch to be approved, we need a test too. PS: Why is this NOT targeted to 3.3?. We have time, yet. |
I'm currently reworking this so that the dictionaries are provided in the constructor, and inflateSetDictionary() is called automatically. I've gone over the zlib RFC's and zlibmodule.c, and I'm fairly certain that whatever usage mode might involve multiple calls to SetDictionary() couldn't be supported by the zlib object anyway. r.e. 3.3/3.4 - I merely chose the highest version number I saw, since this diff is against HEAD (as recommended by the docs). It's been quite a few years since I've submitted a patch to CPython! |
Retargetting to python 3.3. If you hurry a bit and I find your patch acceptable (remember the tests!), I will try to integrate it. |
Ok, here's the patch. It has a single short test. For use with SPDY, it's necessary to test that the following stream data also correctly decompresses, I'll attach that to the next comment. |
This test is rather large, since it includes the predefined SPDY draft 2 dictionary, and some real-world data. Not sure what the policy is on including so much data in a test. If there's enough time I could make a smaller test that also verifies the correct behavior on a stream... |
Argh, probably need to add the 'dict' field to the copy() method. |
Updated version of the patch: extends the test, including a test of the streaming behavior needed for SPDY (both compression and decompression). Also wik: copy()/uncopy() are aware of the 'dict' attribute. |
Added a few comments on Rietveld. |
I've posted a review on Rietveld. |
renames dict->zdict, splits the test, adds BEGIN/END around inflate call. |
Status of this feature?. Ready to integrate? |
The code should be changed to use the buffer API (instead of accepting |
Sam, the window for Python 3.3 integration is almost close. Could you possibly update your patch with Nadeem's feedback?. |
I think other than the disagreement about whether the dictionary constructor arg should be a buffer object, it's good to go. You folks are of course welcome to change it, though. 8^) |
I disagree that we should require the dictionary to be immutable - if the
Even so, the surrounding code sets a precedent for how it accepts binary Nitpicking about the API aside, thanks for the patch :-) |
So my question is easy: could we apply this patch as is and defer any "improvement" to 3.4?. The risk of not doing so would be to miss 3.3 completely. |
I plan to commit it (along with the buffer API changes) tomorrow. |
New changeset dd4f7d5c51c7 by Nadeem Vawda in branch 'default': |
Committed. Once again, thanks for the patch! |
Just saw this on the checkins list; where are the other options documented? """
PyDoc_STRVAR(compressobj__doc__,
-"compressobj([level]) -- Return a compressor object.\n"
+"compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
+" -- Return a compressor object.\n"
"\n"
-"Optional arg level is the compression level, in 1-9.");
+"Optional arg level is the compression level, in 1-9.\n"
+"\n"
+"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
+"bytes containing subsequences that are likely to occur in the input data.");
""" I'm honestly not certain what they should be, but the following is my best guess: """
PyDoc_STRVAR(compressobj__doc__,
"compressobj([level[, method[, wbits[, memlevel[, strategy[, zdict]]]]]])\n"
" -- Return a compressor object.\n"
"\n"
-"Optional arg level is the compression level, in 1-9.\n"
+"Optional arg level (1-9) is the compression level.\n"
+"Larger numbers take longer, but produce smaller results.\n"
"\n"
+"Optional arg method is the compression method.\n"
+"The only currently supported method is zlib.DEFLATED.\n"
+"\n"
+"Optional arg wbits determines the window buffer size.\n"
+"Normal values are 8 (least memory) to 15 (best compression).\n"
+"\n"
+"Optional arg memlevel (1-9) controls working memory size.\n"
+"Larger numbers use more memory, but produce smaller results more quickly.\n"
+"\n"
+"Optional arg strategy tunes the compression algorithm.\n"
+"Supported options include zlib.Z_DEFAULT_STRATEGY, zlib.Z_FILTERED, and zlib.Z_HUFFMAN_ONLY.\n"
+"\n"
+"Optional arg zdict is the predefined compression dictionary - a sequence of\n"
+"bytes containing subsequences that are likely to occur in the input data.");
""" |
They aren't, AFAIK. I've been planning on adding them when I've got time |
New changeset 1cfa44cb5af0 by Nadeem Vawda in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: