Issue 403753: zlib decompress; uncontrollable memory usage

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/33910

classification

Title:	zlib decompress; uncontrollable memory usage
Type:		Stage:
Components:	Extension Modules	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	jhylton	Nosy List:	akuchling, gregory.p.smith, htrd, jhylton, loewis, titus
Priority:	normal	Keywords:	patch

Created on 2001-02-12 16:10 by htrd, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
None	htrd, 2001-02-12 16:10	None
zlib-patch-update.gz	loewis, 2001-09-25 09:18	File sent from titus

Messages (18)
msg35701 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-02-12 16:10
zlib's decompress method will allocate as much memory as is needed to hold the decompressed output. The length of the output buffer may be very much larger than the length of the input buffer, and the python code calling the decompress method has no other way to control how much memory is allocated. In experimentation, I seen decompress generate output that is 1000 times larger than its input These characteristics may make the decompress method unsuitable for handling data obtained from untrusted sources (for example, in a http proxy which implements gzip encoding) since it may be vulnerable to a denial of service attack. A malicious user could construct a moderately sized input which forces 'decompress' to try to allocate too much memory. This patch adds a new method, decompress_incremental, which allows the caller to specify the maximum size of the output. This method returns the excess input, in addition to the decompressed output. It is possible to solve this problem without a patch: If input is fed to the decompressor a few tens of bytes at a time, memory usage will surge by (at most) a few tens of kilobytes. Such a process is a kludge, and much less efficient that the approach used in this patch. (Ive not been able to test the documentation patch; I hope its ok) (This patch also includes the change from Patch #103748)
msg35702 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2001-02-21 02:48
Rather than introducing a new method, why not just add an optional maxlen argument to .decompress(). I think the changes would be: * add 'int maxlen=-1;' * add "...\|i" ... ,&maxlen to the argument parsing * if maxlen != -1, length = maxlen else length = DEFAULTALLOC; * Add '&& maxlen==-1' to the while loop. (Use the current CVS; I just checked in a patch rearranging the zlib module a bit.) Do you want to make those changes and resubmit the patch?
msg35703 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2001-02-21 19:32
Doesn't .unused_data serve much the same purpose, though? So that even with a maximum size, .decompress() always returns a string, and .unused_data would contain the unprocessed data.
msg35704 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-02-21 14:00
I did consider that.... An extra change that you didnt mention is the need for a different return value. Currently .decompress() always returns a string. The new method in my patch returns a tuple containing the same string, and an integer specifying how many bytes were consumed from the input. Overloading return values based on an optional parameter seems a little hairy to me, but I would be happy to change the patch if that is your preferred option. I also considered (and rejected) the possibility of adding an optional max-size argument to .decompress() as you suggest, but raising an exception if this limit is exceeded. This avoids the need for an extra return value, but looses out on flexibility.
msg35705 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-02-22 11:40
We can reuse .unused_data without introducing an ambiguity. I will prepare a patch that uses a new attribute .unconsumed_tail
msg35706 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-02-22 11:42
Waaah - that last comment should be 'cant' not 'can'
msg35707 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-02-22 12:50
New patch implementing a new optional parameter to .decompress, and a new attribute .unconsumed_tail
msg35708 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2001-04-07 07:20
Logged In: YES user_id=413 as a side note. I believe I implemented a python workaround for this problem by just decompressing data in small chunks (4k at a time) using a decompressor object. see the mojonation project on sourceforge if you're curious. (specifically, in the mojonation evil module, look at common/mojoutil.py for function named safe_zlib_decompress). Regardless, I like thie idea of this patch. It would be good to have that in the main API and documentation for simplicity. (and because there are too many programmers out there who don't realize potential denial of service issues on their own...)
msg35709 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2001-04-10 17:52
Logged In: YES user_id=11375 I've let this patch gather dust long enough. Unassigning so that someone else can review it.
msg35710 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2001-06-04 20:55
Logged In: YES user_id=21627 The patch looks good to me; I recommend to approve it.
msg35711 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2001-09-07 17:03
Logged In: YES user_id=21627 Since this patch has been sitting around for such a long time, it eventually got outdated, due to the conflicting MT patch. Any volunteers to update it?
msg35712 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-09-10 14:18
Logged In: YES user_id=46460 I volunteer to update it
msg35713 - (view)	Author: Titus Brown (titus)	Date: 2001-09-25 06:42
Logged In: YES user_id=23486 I have integrated this patch into the latest Python CVS tree, with a few additional modifications. * added spaces into the documentation patch & verified that doc building works; * fixed problem where the 'unconsumed_tail' attribute returned was actually the unused_data attribute; this was caused by indiscriminate cutting & pasting ;). * put in a few additional comments and fixed the code for 80 cols. It doesn't look like I can submit an attachment with this comment (?); I will e-mail it to both loewis & htrd.
msg35714 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-09-25 09:41
Logged In: YES user_id=46460 The patch from titus (added by loewis) looks good
msg35715 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-09-25 09:43
Logged In: YES user_id=46460 The patch from titus (added by loewis) looks good
msg35716 - (view)	Author: Toby Dickenson (htrd)	Date: 2001-09-25 09:44
Logged In: YES user_id=46460 The patch from titus (added by loewis) looks good
msg35717 - (view)	Author: Jeremy Hylton (jhylton)	Date: 2001-10-16 20:15
Logged In: YES user_id=31392 Indeed, looks like a good feature. Patch is fine with a few little changes. I'll check it in.
msg35718 - (view)	Author: Jeremy Hylton (jhylton)	Date: 2001-10-16 20:41
Logged In: YES user_id=31392 Committed with mods as libzlib.tex 1.27 test_zlib.py 1.15 test_zlib 1.5 zlibmodule.c 2.44

History
Date	User	Action	Args
2022-04-10 16:03:44	admin	set	github: 33910
2001-02-12 16:10:51	htrd	create