Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib decompress; uncontrollable memory usage #33910

Closed
htrd mannequin opened this issue Feb 12, 2001 · 18 comments
Closed

zlib decompress; uncontrollable memory usage #33910

htrd mannequin opened this issue Feb 12, 2001 · 18 comments
Labels
extension-modules C modules in the Modules dir

Comments

@htrd
Copy link
Mannequin

htrd mannequin commented Feb 12, 2001

BPO 403753
Nosy @loewis, @akuchling, @gpshead
Files
  • zlib-patch-update.gz: File sent from titus
  • None: None
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2001-10-16.20:41:08.000>
    created_at = <Date 2001-02-12.16:10:51.000>
    labels = ['extension-modules']
    title = 'zlib decompress; uncontrollable memory usage'
    updated_at = <Date 2001-10-16.20:41:08.000>
    user = 'https://bugs.python.org/htrd'

    bugs.python.org fields:

    activity = <Date 2001-10-16.20:41:08.000>
    actor = 'jhylton'
    assignee = 'jhylton'
    closed = True
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2001-02-12.16:10:51.000>
    creator = 'htrd'
    dependencies = []
    files = ['3124', '3125']
    hgrepos = []
    issue_num = 403753
    keywords = ['patch']
    message_count = 18.0
    messages = ['35701', '35702', '35703', '35704', '35705', '35706', '35707', '35708', '35709', '35710', '35711', '35712', '35713', '35714', '35715', '35716', '35717', '35718']
    nosy_count = 6.0
    nosy_names = ['loewis', 'jhylton', 'akuchling', 'htrd', 'gregory.p.smith', 'titus']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue403753'
    versions = []

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Feb 12, 2001

    zlib's decompress method will allocate as much memory as is
    needed to hold the decompressed output. The length of the output
    buffer may be very much larger than the length of the input buffer,
    and the python code calling the decompress method has no other way
    to control how much memory is allocated.

    In experimentation, I seen decompress generate output that is
    1000 times larger than its input

    These characteristics may make the decompress method unsuitable for
    handling data obtained from untrusted sources (for example,
    in a http proxy which implements gzip encoding) since it may be
    vulnerable to a denial of service attack. A malicious user could
    construct a moderately sized input which forces 'decompress' to
    try to allocate too much memory.

    This patch adds a new method, decompress_incremental, which allows
    the caller to specify the maximum size of the output. This method
    returns the excess input, in addition to the decompressed output.

    It is possible to solve this problem without a patch:
    If input is fed to the decompressor a few tens of bytes
    at a time, memory usage will surge by (at most)
    a few tens of kilobytes. Such a process is a kludge, and much
    less efficient that the approach used in this patch.

    (Ive not been able to test the documentation patch; I hope its ok)

    (This patch also includes the change from Patch bpo-103748)

    @htrd htrd mannequin closed this as completed Feb 12, 2001
    @htrd htrd mannequin assigned jhylton Feb 12, 2001
    @htrd htrd mannequin added the extension-modules C modules in the Modules dir label Feb 12, 2001
    @htrd htrd mannequin closed this as completed Feb 12, 2001
    @htrd htrd mannequin assigned jhylton Feb 12, 2001
    @htrd htrd mannequin added the extension-modules C modules in the Modules dir label Feb 12, 2001
    @akuchling
    Copy link
    Member

    Rather than introducing a new method, why not just add an optional maxlen argument to .decompress(). I think the changes would be:

    • add 'int maxlen=-1;'
    • add "...|i" ... ,&maxlen to the argument parsing
    • if maxlen != -1, length = maxlen else length = DEFAULTALLOC;
    • Add '&& maxlen==-1' to the while loop. (Use the current CVS; I just checked in a patch rearranging the zlib module a bit.)

    Do you want to make those changes and resubmit the patch?

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Feb 21, 2001

    I did consider that....

    An extra change that you didnt mention is the need for a different return value. Currently .decompress() always returns a string. The new method in my patch returns a tuple containing the same string, and an integer specifying how many bytes were consumed from the input.

    Overloading return values based on an optional parameter seems a little hairy to me, but I would be happy to change the patch if that is your preferred option.

    I also considered (and rejected) the possibility of adding an optional max-size argument to .decompress() as you suggest, but raising an exception if this limit is exceeded. This avoids the need for an extra return value, but looses out on flexibility.

    @akuchling
    Copy link
    Member

    Doesn't .unused_data serve much the same purpose, though?
    So that even with a maximum size, .decompress() always returns a string, and .unused_data would contain the unprocessed data.

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Feb 22, 2001

    We can reuse .unused_data without introducing an ambiguity. I will prepare a patch that uses a new attribute .unconsumed_tail

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Feb 22, 2001

    Waaah - that last comment should be 'cant' not 'can'

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Feb 22, 2001

    New patch implementing a new optional parameter to .decompress, and a new attribute .unconsumed_tail

    @gpshead
    Copy link
    Member

    gpshead commented Apr 7, 2001

    Logged In: YES
    user_id=413

    as a side note. I believe I implemented a python workaround
    for this problem by just decompressing data in small chunks
    (4k at a time) using a decompressor object.

    see the mojonation project on sourceforge if you're
    curious. (specifically, in the mojonation evil module, look
    at common/mojoutil.py for function named
    safe_zlib_decompress).

    Regardless, I like thie idea of this patch. It would be
    good to have that in the main API and documentation for
    simplicity. (and because there are too many programmers out
    there who don't realize potential denial of service issues
    on their own...)

    @akuchling
    Copy link
    Member

    Logged In: YES
    user_id=11375

    I've let this patch gather dust long enough. Unassigning so
    that someone else can review it.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jun 4, 2001

    Logged In: YES
    user_id=21627

    The patch looks good to me; I recommend to approve it.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 7, 2001

    Logged In: YES
    user_id=21627

    Since this patch has been sitting around for such a long
    time, it eventually got outdated, due to the conflicting MT
    patch. Any volunteers to update it?

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Sep 10, 2001

    Logged In: YES
    user_id=46460

    I volunteer to update it

    @titus
    Copy link
    Mannequin

    titus mannequin commented Sep 25, 2001

    Logged In: YES
    user_id=23486

    I have integrated this patch into the latest Python CVS tree,
    with a few additional modifications.

    • added spaces into the documentation patch & verified that
      doc building works;

    • fixed problem where the 'unconsumed_tail' attribute returned
      was *actually* the unused_data attribute; this was caused by
      indiscriminate cutting & pasting ;).

    • put in a few additional comments and fixed the code for 80
      cols.

    It doesn't look like I can submit an attachment with this
    comment (?); I will e-mail it to both loewis & htrd.

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Sep 25, 2001

    Logged In: YES
    user_id=46460

    The patch from titus (added by loewis) looks good

    2 similar comments
    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Sep 25, 2001

    Logged In: YES
    user_id=46460

    The patch from titus (added by loewis) looks good

    @htrd
    Copy link
    Mannequin Author

    htrd mannequin commented Sep 25, 2001

    Logged In: YES
    user_id=46460

    The patch from titus (added by loewis) looks good

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Oct 16, 2001

    Logged In: YES
    user_id=31392

    Indeed, looks like a good feature. Patch is fine with a few
    little changes. I'll check it in.

    @jhylton
    Copy link
    Mannequin

    jhylton mannequin commented Oct 16, 2001

    Logged In: YES
    user_id=31392

    Committed with mods as
    libzlib.tex 1.27
    test_zlib.py 1.15
    test_zlib 1.5
    zlibmodule.c 2.44

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants