|
msg86227 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-04-21 09:51 |
It can be useful to get whatever remaining data there exists in the
decompression stream for further processing. This is currently possible
by creating a zlib.decompressobj and accessing its unused_data member,
but that is quite cumbersome.
The attached patch adds a 'tail' keyword argument to zlib.decompress
which causes the result to be a 2-tuple, with the first part containing
the decompressed data, and the second containing any remaining data
|
|
msg86240 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-04-21 15:43 |
As usual the patch should come with some unit tests.
Also, it seems you don't test the return value of
PyString_FromStringAndSize before building your tuple.
|
|
msg86241 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-04-21 15:46 |
Unittests coming up.
It is not necessary to test the return value, it is done by
Py_BuildValue(). I think it is common to do it this way, but if You
think explicit testing is better, then that's no problem.
|
|
msg86244 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-04-21 15:57 |
> It is not necessary to test the return value, it is done by
> Py_BuildValue().
Sorry, it's ok then.
|
|
msg86247 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-04-21 16:13 |
New patch, with explicit error test and tests in test_zlib.py, both for
the "tail" parameter and the hitherto untested "unused_data" attribute.
I'm not completely happy with the name of the argument, "tail", any
suggestions? Also, since I made compress a "keywords" method, should I
apply that to the rest of the functions?
|
|
msg86284 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-04-22 10:26 |
Note, I thought of the "tail" argument to reflect the "unused_data"
member of the decompressobj(). But a more useful way is actually this:
result, offset = zlib.decompress(buffer, offset=0)
if offset<len(buffer):
result2, offset = zlib.decompress(buffer, offset=offset)
This avoids data copying as much as possible. The presence of a non-
default offset argument, would trigger the "tuple" return value.
We could add the same argument to decompressobj.decompress, and
add "unused_offset" member there for symmetry.
Any thoughts?
|
|
msg86373 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-04-23 19:18 |
I've added a new patch, which instead uses an "offset" parameter. This
is more natural, perhaps, and also avoids data copying. Complete with
tests and documentation update.
|
|
msg87550 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2009-05-10 21:05 |
Overall, this looks good. Some mostly minor comments in this review.
http://codereview.appspot.com/63060/diff/1/2
File Doc/library/zlib.rst (right):
http://codereview.appspot.com/63060/diff/1/2#newcode136
Line 136: When specified, it will cause the function's return value to
be a (uncompressed, offset)
how about this wording:
..."to be a tuple of (uncompressed, offset), with the"...
http://codereview.appspot.com/63060/diff/1/2#newcode142
Line 142: Added the *offset* argument
missing . at the end of the sentence. Also, mention that this function
now accepts keyword arguments.
http://codereview.appspot.com/63060/diff/1/3
File Lib/test/test_zlib.py (right):
http://codereview.appspot.com/63060/diff/1/3#newcode110
Line 110: c = zlib.compress(a)+zlib.compress(b)
please surround the + with spaces for readability and consistency with
other code in the file.\
http://codereview.appspot.com/63060/diff/1/4
File Modules/zlibmodule.c (right):
http://codereview.appspot.com/63060/diff/1/4#newcode193
Line 193: "the start position in the string to start decompresion and,
if spceified,\n"
typo: specified.
http://codereview.appspot.com/63060/diff/1/4#newcode296
Line 296: offset = length-zst.avail_in+offset;
leave spaces around your - and + operators and this becomes easier to
read.
http://codereview.appspot.com/63060
|
|
msg87551 - (view) |
Author: Robert Collins (rbcollins) |
Date: 2009-05-10 21:14 |
Maybe I'm missing something, but isn't the offset parameter just another
way of spelling
buffer(input, offset)?
I like the avoiding of copying, just wondering if having a magic
parameter to get a tuple is really better than (say)
result, used = zlib.decompress2(source)
if used<len(source):
result2, used = zlib.decompress2(buffer(source, used))
This changes two things. Rather than a magic parameter it adds a new
function which always returns tuples (meaning that you don't need magic
to make sure users don't accidentally pass offset=None and get a single
result when they wanted a tuple), and uses the built in buffer facility
to avoid copying rather than relying on delayed slicing.
|
|
msg87567 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2009-05-11 08:56 |
Well, yes and no.
I agree that it is annoying to have to add another parameter to get a
returning tuple. But you have to weigh that against adding another
conveinence API.
On the other hand, I think the offset argument is quite convenient. If
we were to add a "decompress2" function, I would like to keep the
offset because it removes the extra overhead of having to create a
temporary buffer object, at no extra cost.
Also, note that your example code becomes more complex, for a loop.
You would need to:
results = []
used =0
while used<len(source):
r, u = zlib.decompress2(buffer(source, used))
used += u
results.append(r)
as opposed to:
r, used = zlib.decompress2(source, used)
results.append(r)
You have to make sure you add the buffer's hidden offset to the output
offset.
There is a third option. The "offset" could be a list containg the
offset in its 0th position, a sort of "byref" passing, but that is not
in the general Python spirit, is it?
This would give us the loop:
o = [0]
while len(source)>o[0]
results.append(zlib.decompress(source, offset=offsetlist))
|
|
msg87568 - (view) |
Author: Robert Collins (rbcollins) |
Date: 2009-05-11 09:02 |
Well, I think its relatively uncommon to be doing such a loop with a
static buffer anyway - often you'll instead be reading from disk or a
network stream; if we could make those cases simpler and avoid copying
that would be great.
Anyhow, no strong opinions here, just saw it going by before added my
2c.
-Rob
|
|
msg110993 - (view) |
Author: Mark Lawrence (BreamoreBoy) |
Date: 2010-07-21 02:47 |
This has been reviewed see msg87550, can we get this into 3.2?
|
|
msg111017 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2010-07-21 07:25 |
The function returns different kind of data depending on the value of the last parameter. I don't like it at all.
Why is decompressobj not enough?
|
|
msg113415 - (view) |
Author: Kristján Valur Jónsson (krisvale) * |
Date: 2010-08-09 13:53 |
decompressobj is indeed "enough". But if you are doing a lot of this (decompressing chunks), then using the unused_data, as it is, involves a lot of copying.
If there were a "unused_data_pos" or some equivalent, then it would be possible to continue decommpressing with buffer(olddata, unused_data_pos), without copying the source data.
The point of the "offset" keyword argument was to have a way to get this end position also without having an explicit decompressobj.
I agree that having the result type change to a tuple is not good.
So, a suggestion:
1) Add the unused_data_pos to the decompressobj.
2) (opotional) Add the automatic retrieval of this with a decompress_ex method, for convenience, making it usable without creating a decompressobj
3) (optional) Add the "offset" kw argument to decompress and (decompress_ex) making the creation of a temporary buffer() object unnecessary.
|
|
| Date |
User |
Action |
Args |
| 2012-01-26 13:07:45 | nadeem.vawda | set | nosy:
+ nadeem.vawda
|
| 2010-08-09 13:53:55 | krisvale | set | messages:
+ msg113415 |
| 2010-07-21 07:25:33 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg111017
|
| 2010-07-21 02:47:37 | BreamoreBoy | set | versions:
+ Python 3.2, - Python 3.1, Python 2.7 nosy:
+ BreamoreBoy
messages:
+ msg110993
stage: patch review -> commit review |
| 2009-05-11 09:02:04 | rbcollins | set | messages:
+ msg87568 |
| 2009-05-11 08:56:24 | krisvale | set | messages:
+ msg87567 |
| 2009-05-10 21:14:40 | rbcollins | set | nosy:
+ rbcollins messages:
+ msg87551
|
| 2009-05-10 21:05:29 | gregory.p.smith | set | messages:
+ msg87550 |
| 2009-05-10 20:49:45 | gregory.p.smith | set | nosy:
+ gregory.p.smith
|
| 2009-05-08 09:45:40 | krisvale | set | title: Add a "tail" argument to zlib.decompress -> Add an 'offset' argument to zlib.decompress |
| 2009-04-23 19:18:40 | krisvale | set | files:
+ zlibpatch2.patch
messages:
+ msg86373 |
| 2009-04-22 10:26:02 | krisvale | set | messages:
+ msg86284 |
| 2009-04-21 16:13:14 | krisvale | set | files:
+ zlibpatch.patch
messages:
+ msg86247 |
| 2009-04-21 15:57:20 | pitrou | set | messages:
+ msg86244 |
| 2009-04-21 15:46:42 | krisvale | set | messages:
+ msg86241 |
| 2009-04-21 15:43:15 | pitrou | set | priority: normal versions:
+ Python 3.1 nosy:
+ pitrou
messages:
+ msg86240
stage: patch review |
| 2009-04-21 09:51:41 | krisvale | create | |