This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zlib decompress; uncontrollable memory usage
Type: Stage:
Components: Extension Modules Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: jhylton Nosy List: akuchling, gregory.p.smith, htrd, jhylton, loewis, titus
Priority: normal Keywords: patch

Created on 2001-02-12 16:10 by htrd, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
None htrd, 2001-02-12 16:10 None
zlib-patch-update.gz loewis, 2001-09-25 09:18 File sent from titus
Messages (18)
msg35701 - (view) Author: Toby Dickenson (htrd) Date: 2001-02-12 16:10
zlib's decompress method will allocate as much memory as is
needed to hold the decompressed output. The length of the output
buffer may be very much larger than the length of the input buffer,
and the python code calling the decompress method has no other way
to control how much memory is allocated.

In experimentation, I seen decompress generate output that is
1000 times larger than its input

These characteristics may make the decompress method unsuitable for
handling data obtained from untrusted sources (for example,
in a http proxy which implements gzip encoding) since it may be
vulnerable to a denial of service attack. A malicious user could
construct a moderately sized input which forces 'decompress' to
try to allocate too much memory.

This patch adds a new method, decompress_incremental, which allows
the caller to specify the maximum size of the output. This method
returns the excess input, in addition to the decompressed output.

It is possible to solve this problem without a patch:
If input is fed to the decompressor a few tens of bytes
at a time, memory usage will surge by (at most)
a few tens of kilobytes. Such a process is a kludge, and much
less efficient that the approach used in this patch.

(Ive not been able to test the documentation patch; I hope its ok)

(This patch also includes the change from Patch #103748)
msg35702 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2001-02-21 02:48
Rather than introducing a new method, why not just add an optional maxlen argument to .decompress().  I think the changes would be:

* add 'int maxlen=-1;'
* add "...|i" ... ,&maxlen to the argument parsing
* if maxlen != -1, length = maxlen else length = DEFAULTALLOC;
* Add '&& maxlen==-1' to the while loop.  (Use the current CVS; I just checked in a patch rearranging the zlib module a bit.)

Do you want to make those changes and resubmit the patch?
msg35703 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2001-02-21 19:32
Doesn't .unused_data serve much the same purpose, though?
So that even with a maximum size, .decompress() always returns a string, and .unused_data would contain the unprocessed data.
msg35704 - (view) Author: Toby Dickenson (htrd) Date: 2001-02-21 14:00
I did consider that....

An extra change that you didnt mention is the need for a different return value. Currently .decompress() always returns a string. The new method in my patch returns a tuple containing the same string, and an integer specifying how many bytes were consumed from the input.

Overloading return values based on an optional parameter seems a little hairy to me, but I would be happy to change the patch if that is your preferred option.

I also considered (and rejected) the possibility of adding an optional max-size argument to .decompress() as you suggest, but raising an exception if this limit is exceeded. This avoids the need for an extra return value, but looses out on flexibility.
msg35705 - (view) Author: Toby Dickenson (htrd) Date: 2001-02-22 11:40
We can reuse .unused_data without introducing an ambiguity. I will prepare a patch that uses a new attribute .unconsumed_tail

msg35706 - (view) Author: Toby Dickenson (htrd) Date: 2001-02-22 11:42
Waaah - that last comment should be 'cant' not 'can'
msg35707 - (view) Author: Toby Dickenson (htrd) Date: 2001-02-22 12:50
New patch implementing a new optional parameter to .decompress, and a new attribute .unconsumed_tail

msg35708 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2001-04-07 07:20
Logged In: YES 
user_id=413

as a side note.  I believe I implemented a python workaround
for this problem by just decompressing data in small chunks
(4k at a time) using a decompressor object.

see the mojonation project on sourceforge if you're
curious.  (specifically, in the mojonation evil module, look
at common/mojoutil.py for function named
safe_zlib_decompress).

Regardless, I like thie idea of this patch.  It would be
good to have that in the main API and documentation for
simplicity. (and because there are too many programmers out
there who don't realize potential denial of service issues
on their own...)
msg35709 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2001-04-10 17:52
Logged In: YES 
user_id=11375

I've let this patch gather dust long enough.  Unassigning so
that someone else can review it.
msg35710 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-06-04 20:55
Logged In: YES 
user_id=21627

The patch looks good to me; I recommend to approve it.

msg35711 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-09-07 17:03
Logged In: YES 
user_id=21627

Since this patch has been sitting around for such a long
time, it eventually got outdated, due to the conflicting MT
patch. Any volunteers to update it?
msg35712 - (view) Author: Toby Dickenson (htrd) Date: 2001-09-10 14:18
Logged In: YES 
user_id=46460

I volunteer to update it
msg35713 - (view) Author: Titus Brown (titus) Date: 2001-09-25 06:42
Logged In: YES 
user_id=23486

I have integrated this patch into the latest Python CVS tree,
with a few additional modifications.

* added spaces into the documentation patch & verified that
  doc building works;

* fixed problem where the 'unconsumed_tail' attribute returned
  was *actually* the unused_data attribute; this was caused by
  indiscriminate cutting & pasting ;).

* put in a few additional comments and fixed the code for 80  
  cols.

It doesn't look like I can submit an attachment with this
comment (?); I will e-mail it to both loewis & htrd.
msg35714 - (view) Author: Toby Dickenson (htrd) Date: 2001-09-25 09:41
Logged In: YES 
user_id=46460

The patch from titus (added by loewis) looks good
msg35715 - (view) Author: Toby Dickenson (htrd) Date: 2001-09-25 09:43
Logged In: YES 
user_id=46460

The patch from titus (added by loewis) looks good
msg35716 - (view) Author: Toby Dickenson (htrd) Date: 2001-09-25 09:44
Logged In: YES 
user_id=46460

The patch from titus (added by loewis) looks good
msg35717 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2001-10-16 20:15
Logged In: YES 
user_id=31392

Indeed, looks like a good feature.  Patch is fine with a few
little changes.  I'll check it in.
msg35718 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2001-10-16 20:41
Logged In: YES 
user_id=31392

Committed with mods as
libzlib.tex 1.27
test_zlib.py 1.15
test_zlib 1.5
zlibmodule.c 2.44

History
Date User Action Args
2022-04-10 16:03:44adminsetgithub: 33910
2001-02-12 16:10:51htrdcreate