classification
Title: Limit decompressed data when reading from GzipFile
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Limit decompressed data when reading from LZMAFile and BZ2File
View: 23529
Assigned To: Nosy List: Arfrever, martin.panter, nikratio, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-02-26 10:22 by martin.panter, last changed 2015-04-10 22:31 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
gzip-bomb.patch martin.panter, 2015-02-26 10:22 review
Messages (3)
msg236659 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-26 10:22
This is a patch I originally posted at Issue 15955, but am moving it to a separate issue so there is less confusion. GzipFile.read(<size>) etc is susceptible to decompression bombing. My patch tests and fixes that, making use of the existing “max_length” parameter in the “zlib” module.

The rest of Issue 15955 is about enhancing the bzip and LZMA modules to support limited decompression, but since the zlib module can already limit the decompressed data, I think this gzip patch should be considered as a bug fix rather than enhancement, e.g. the fix for Issue 16043 (gzip decoding for XML RPC module) assumed GzipFile.read(<size>) is limited.
msg236706 - (view) Author: Nikolaus Rath (nikratio) * Date: 2015-02-26 21:10
Especially now that this is only going to go into 3.5, I think it makes more sense to handle GzipFile, LZMAFile and BZ2File all in one go. Looking at the code, otherwise there's going to be a lot of duplication.

How about introducing a base class 'CompressedFile' that defines most of the logic that's currently in LZMAFile (including the max_size patch from issue 23529), and having {LZMA,BZ2,Gzip}File all inherit from that base?

BZ2File and LZMAFile would probably only need to define their own constructor to instantiate the proper compressor/decompressor object.

GzipFile would need to additionally overwrite read() and write() in order to handle the CRC and gzip header. But I think both methods could still be written to call super().read/write().

Did I miss something?
msg236861 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-28 00:54
Perhaps we should move the discussion of a common base class to Issue 23529. I only opened this as a separate issue because I thought it might be appropriate as a bug fix for 3.4.
History
Date User Action Args
2015-04-10 22:31:32pitrousetstatus: open -> closed
superseder: Limit decompressed data when reading from LZMAFile and BZ2File
resolution: duplicate
stage: patch review -> resolved
2015-02-28 09:42:45Arfreversetnosy: + Arfrever
2015-02-28 00:54:29martin.pantersetmessages: + msg236861
2015-02-26 21:10:02nikratiosetmessages: + msg236706
2015-02-26 18:05:21pitrousetnosy: + pitrou
2015-02-26 12:12:20pitrousetstage: patch review
versions: - Python 3.4
2015-02-26 11:11:39serhiy.storchakasetnosy: + serhiy.storchaka
2015-02-26 10:23:24martin.pantersettype: behavior -> resource usage
2015-02-26 10:22:06martin.pantercreate