Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit decompressed data when reading from GzipFile #67716

Closed
vadmium opened this issue Feb 26, 2015 · 3 comments
Closed

Limit decompressed data when reading from GzipFile #67716

vadmium opened this issue Feb 26, 2015 · 3 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@vadmium
Copy link
Member

vadmium commented Feb 26, 2015

BPO 23528
Nosy @pitrou, @vadmium, @serhiy-storchaka
Superseder
  • bpo-23529: Limit decompressed data when reading from LZMAFile and BZ2File
  • Files
  • gzip-bomb.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-04-10.22:31:32.725>
    created_at = <Date 2015-02-26.10:22:06.943>
    labels = ['library', 'performance']
    title = 'Limit decompressed data when reading from GzipFile'
    updated_at = <Date 2015-04-10.22:31:32.724>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2015-04-10.22:31:32.724>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-04-10.22:31:32.725>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2015-02-26.10:22:06.943>
    creator = 'martin.panter'
    dependencies = []
    files = ['38243']
    hgrepos = []
    issue_num = 23528
    keywords = ['patch']
    message_count = 3.0
    messages = ['236659', '236706', '236861']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'Arfrever', 'nikratio', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '23529'
    type = 'resource usage'
    url = 'https://bugs.python.org/issue23528'
    versions = ['Python 3.5']

    @vadmium
    Copy link
    Member Author

    vadmium commented Feb 26, 2015

    This is a patch I originally posted at bpo-15955, but am moving it to a separate issue so there is less confusion. GzipFile.read(<size>) etc is susceptible to decompression bombing. My patch tests and fixes that, making use of the existing “max_length” parameter in the “zlib” module.

    The rest of bpo-15955 is about enhancing the bzip and LZMA modules to support limited decompression, but since the zlib module can already limit the decompressed data, I think this gzip patch should be considered as a bug fix rather than enhancement, e.g. the fix for bpo-16043 (gzip decoding for XML RPC module) assumed GzipFile.read(<size>) is limited.

    @vadmium vadmium added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir performance Performance or resource usage and removed type-bug An unexpected behavior, bug, or error labels Feb 26, 2015
    @nikratio
    Copy link
    Mannequin

    nikratio mannequin commented Feb 26, 2015

    Especially now that this is only going to go into 3.5, I think it makes more sense to handle GzipFile, LZMAFile and BZ2File all in one go. Looking at the code, otherwise there's going to be a lot of duplication.

    How about introducing a base class 'CompressedFile' that defines most of the logic that's currently in LZMAFile (including the max_size patch from bpo-23529), and having {LZMA,BZ2,Gzip}File all inherit from that base?

    BZ2File and LZMAFile would probably only need to define their own constructor to instantiate the proper compressor/decompressor object.

    GzipFile would need to additionally overwrite read() and write() in order to handle the CRC and gzip header. But I think both methods could still be written to call super().read/write().

    Did I miss something?

    @vadmium
    Copy link
    Member Author

    vadmium commented Feb 28, 2015

    Perhaps we should move the discussion of a common base class to bpo-23529. I only opened this as a separate issue because I thought it might be appropriate as a bug fix for 3.4.

    @pitrou pitrou closed this as completed Apr 10, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants