msg86227 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-04-21 09:51 |
It can be useful to get whatever remaining data there exists in the
decompression stream for further processing. This is currently possible
by creating a zlib.decompressobj and accessing its unused_data member,
but that is quite cumbersome.
The attached patch adds a 'tail' keyword argument to zlib.decompress
which causes the result to be a 2-tuple, with the first part containing
the decompressed data, and the second containing any remaining data
|
msg86240 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-04-21 15:43 |
As usual the patch should come with some unit tests.
Also, it seems you don't test the return value of
PyString_FromStringAndSize before building your tuple.
|
msg86241 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-04-21 15:46 |
Unittests coming up.
It is not necessary to test the return value, it is done by
Py_BuildValue(). I think it is common to do it this way, but if You
think explicit testing is better, then that's no problem.
|
msg86244 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-04-21 15:57 |
> It is not necessary to test the return value, it is done by
> Py_BuildValue().
Sorry, it's ok then.
|
msg86247 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-04-21 16:13 |
New patch, with explicit error test and tests in test_zlib.py, both for
the "tail" parameter and the hitherto untested "unused_data" attribute.
I'm not completely happy with the name of the argument, "tail", any
suggestions? Also, since I made compress a "keywords" method, should I
apply that to the rest of the functions?
|
msg86284 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-04-22 10:26 |
Note, I thought of the "tail" argument to reflect the "unused_data"
member of the decompressobj(). But a more useful way is actually this:
result, offset = zlib.decompress(buffer, offset=0)
if offset<len(buffer):
result2, offset = zlib.decompress(buffer, offset=offset)
This avoids data copying as much as possible. The presence of a non-
default offset argument, would trigger the "tuple" return value.
We could add the same argument to decompressobj.decompress, and
add "unused_offset" member there for symmetry.
Any thoughts?
|
msg86373 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-04-23 19:18 |
I've added a new patch, which instead uses an "offset" parameter. This
is more natural, perhaps, and also avoids data copying. Complete with
tests and documentation update.
|
msg87550 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2009-05-10 21:05 |
Overall, this looks good. Some mostly minor comments in this review.
http://codereview.appspot.com/63060/diff/1/2
File Doc/library/zlib.rst (right):
http://codereview.appspot.com/63060/diff/1/2#newcode136
Line 136: When specified, it will cause the function's return value to
be a (uncompressed, offset)
how about this wording:
..."to be a tuple of (uncompressed, offset), with the"...
http://codereview.appspot.com/63060/diff/1/2#newcode142
Line 142: Added the *offset* argument
missing . at the end of the sentence. Also, mention that this function
now accepts keyword arguments.
http://codereview.appspot.com/63060/diff/1/3
File Lib/test/test_zlib.py (right):
http://codereview.appspot.com/63060/diff/1/3#newcode110
Line 110: c = zlib.compress(a)+zlib.compress(b)
please surround the + with spaces for readability and consistency with
other code in the file.\
http://codereview.appspot.com/63060/diff/1/4
File Modules/zlibmodule.c (right):
http://codereview.appspot.com/63060/diff/1/4#newcode193
Line 193: "the start position in the string to start decompresion and,
if spceified,\n"
typo: specified.
http://codereview.appspot.com/63060/diff/1/4#newcode296
Line 296: offset = length-zst.avail_in+offset;
leave spaces around your - and + operators and this becomes easier to
read.
http://codereview.appspot.com/63060
|
msg87551 - (view) |
Author: Robert Collins (rbcollins) * |
Date: 2009-05-10 21:14 |
Maybe I'm missing something, but isn't the offset parameter just another
way of spelling
buffer(input, offset)?
I like the avoiding of copying, just wondering if having a magic
parameter to get a tuple is really better than (say)
result, used = zlib.decompress2(source)
if used<len(source):
result2, used = zlib.decompress2(buffer(source, used))
This changes two things. Rather than a magic parameter it adds a new
function which always returns tuples (meaning that you don't need magic
to make sure users don't accidentally pass offset=None and get a single
result when they wanted a tuple), and uses the built in buffer facility
to avoid copying rather than relying on delayed slicing.
|
msg87567 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-05-11 08:56 |
Well, yes and no.
I agree that it is annoying to have to add another parameter to get a
returning tuple. But you have to weigh that against adding another
conveinence API.
On the other hand, I think the offset argument is quite convenient. If
we were to add a "decompress2" function, I would like to keep the
offset because it removes the extra overhead of having to create a
temporary buffer object, at no extra cost.
Also, note that your example code becomes more complex, for a loop.
You would need to:
results = []
used =0
while used<len(source):
r, u = zlib.decompress2(buffer(source, used))
used += u
results.append(r)
as opposed to:
r, used = zlib.decompress2(source, used)
results.append(r)
You have to make sure you add the buffer's hidden offset to the output
offset.
There is a third option. The "offset" could be a list containg the
offset in its 0th position, a sort of "byref" passing, but that is not
in the general Python spirit, is it?
This would give us the loop:
o = [0]
while len(source)>o[0]
results.append(zlib.decompress(source, offset=offsetlist))
|
msg87568 - (view) |
Author: Robert Collins (rbcollins) * |
Date: 2009-05-11 09:02 |
Well, I think its relatively uncommon to be doing such a loop with a
static buffer anyway - often you'll instead be reading from disk or a
network stream; if we could make those cases simpler and avoid copying
that would be great.
Anyhow, no strong opinions here, just saw it going by before added my
2c.
-Rob
|
msg110993 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-07-21 02:47 |
This has been reviewed see msg87550, can we get this into 3.2?
|
msg111017 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-07-21 07:25 |
The function returns different kind of data depending on the value of the last parameter. I don't like it at all.
Why is decompressobj not enough?
|
msg113415 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2010-08-09 13:53 |
decompressobj is indeed "enough". But if you are doing a lot of this (decompressing chunks), then using the unused_data, as it is, involves a lot of copying.
If there were a "unused_data_pos" or some equivalent, then it would be possible to continue decommpressing with buffer(olddata, unused_data_pos), without copying the source data.
The point of the "offset" keyword argument was to have a way to get this end position also without having an explicit decompressobj.
I agree that having the result type change to a tuple is not good.
So, a suggestion:
1) Add the unused_data_pos to the decompressobj.
2) (opotional) Add the automatic retrieval of this with a decompress_ex method, for convenience, making it usable without creating a decompressobj
3) (optional) Add the "offset" kw argument to decompress and (decompress_ex) making the creation of a temporary buffer() object unnecessary.
|
msg229045 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-10-10 23:24 |
Moving this from commit review back to no selection, since there doesn't yet seem to be an agreement on an API.
|
msg234040 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-01-14 22:27 |
A different test case for “unused_data” attribute was added in 2012 for Issue 16350, so that part is no longer needed.
If this feature goes ahead, it might be nice to also update the bzip and LZMA modules for consistency.
In Python 3, the equivalent of the buffer() option would look like this, assuming the memory view is in byte format:
zlib.decompress(memoryview(source)[unused_offset:])
Another option would be to copy some paint from the struct.unpack_from() bikeshed:
[data, offset] = zlib.decompress_from(buffer, offset)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:48 | admin | set | github: 50054 |
2015-01-14 22:27:13 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg234040
|
2014-10-10 23:24:51 | r.david.murray | set | versions:
+ Python 3.5, - Python 3.2 nosy:
+ r.david.murray
messages:
+ msg229045
stage: commit review -> |
2014-02-03 19:14:31 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2012-01-26 13:07:45 | nadeem.vawda | set | nosy:
+ nadeem.vawda
|
2010-08-09 13:53:55 | kristjan.jonsson | set | messages:
+ msg113415 |
2010-07-21 07:25:33 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg111017
|
2010-07-21 02:47:37 | BreamoreBoy | set | versions:
+ Python 3.2, - Python 3.1, Python 2.7 nosy:
+ BreamoreBoy
messages:
+ msg110993
stage: patch review -> commit review |
2009-05-11 09:02:04 | rbcollins | set | messages:
+ msg87568 |
2009-05-11 08:56:24 | kristjan.jonsson | set | messages:
+ msg87567 |
2009-05-10 21:14:40 | rbcollins | set | nosy:
+ rbcollins messages:
+ msg87551
|
2009-05-10 21:05:29 | gregory.p.smith | set | messages:
+ msg87550 |
2009-05-10 20:49:45 | gregory.p.smith | set | nosy:
+ gregory.p.smith
|
2009-05-08 09:45:40 | kristjan.jonsson | set | title: Add a "tail" argument to zlib.decompress -> Add an 'offset' argument to zlib.decompress |
2009-04-23 19:18:40 | kristjan.jonsson | set | files:
+ zlibpatch2.patch
messages:
+ msg86373 |
2009-04-22 10:26:02 | kristjan.jonsson | set | messages:
+ msg86284 |
2009-04-21 16:13:14 | kristjan.jonsson | set | files:
+ zlibpatch.patch
messages:
+ msg86247 |
2009-04-21 15:57:20 | pitrou | set | messages:
+ msg86244 |
2009-04-21 15:46:42 | kristjan.jonsson | set | messages:
+ msg86241 |
2009-04-21 15:43:15 | pitrou | set | priority: normal versions:
+ Python 3.1 nosy:
+ pitrou
messages:
+ msg86240
stage: patch review |
2009-04-21 09:51:41 | kristjan.jonsson | create | |