classification
Title: zlib.error with tarfile.open
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, ethan.furman, jack__d, jvoisin, lukasz.langa
Priority: normal Keywords: patch

Created on 2019-12-13 16:00 by jvoisin, last changed 2021-09-29 10:58 by lukasz.langa. This issue is now closed.

Files
File name Uploaded Description Edit
crash-c10c9839d987fa0df6912cb4084f43f3ce08ca82 jvoisin, 2019-12-13 16:00
Pull Requests
URL Status Linked Edit
PR 27766 merged jack__d, 2021-08-14 03:00
PR 28613 merged lukasz.langa, 2021-09-29 09:35
PR 28614 merged lukasz.langa, 2021-09-29 09:42
Messages (9)
msg358337 - (view) Author: jvoisin (jvoisin) Date: 2019-12-13 16:00
The attached file produces the following stacktrace when opened via `tarfile.open`, on Python 3.7.5rc1:

```
$ cat test.py 
import sys
import tarfile

tarfile.open(sys.argv[1])
$ python3 test.py ./crash-c10c9839d987fa0df6912cb4084f43f3ce08ca82
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    tarfile.open(sys.argv[1])
  File "/usr/lib/python3.7/tarfile.py", line 1573, in open
    return func(name, "r", fileobj, **kwargs)
  File "/usr/lib/python3.7/tarfile.py", line 1645, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.7/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.7/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.7/tarfile.py", line 1094, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/usr/lib/python3.7/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.7/gzip.py", line 471, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid distances se
```
msg358340 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-12-13 16:34
This file is also an invalid tar file:

$ tar xf crash-c10c9839d987fa0df6912cb4084f43f3ce08ca82 

gzip: stdin: invalid compressed data--format violated
tar: Child returned status 1
tar: Error is not recoverable: exiting now
msg358341 - (view) Author: jvoisin (jvoisin) Date: 2019-12-13 16:38
Sure, but as a user, I would expect a better exception, like ValueError or ReadError, along with an error message, instead of an unexpected zlib exception.
msg399811 - (view) Author: Jack DeVries (jack__d) * Date: 2021-08-18 00:42
@jvoisin I am able to reproduce the problem when I download your script, but I am having a hard time reproducing it by passing corrupt archives to `tarfile.open`. How exactly was this file corrupted? I am trying to figure out if there are any similar implementation leaks / poor error messages in similar scenarios so I can do my best to patch them all.

You can see the reproduction scripts I am using here to get a better idea of what I have been trying. Be forewarned, they are pretty gnarly!

https://gist.github.com/jdevries3133/acbb5ba2a19093d3bcc214733ef85e5a
msg399989 - (view) Author: jvoisin (jvoisin) Date: 2021-08-20 18:44
The file was created with a fuzzer, like the one described in https://dustri.org/b/fuzzing-python-in-python-and-doing-it-fast.html
msg402834 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-09-29 09:25
New changeset b6fe8572509b77d2002eaddf99d718e9b4835684 by Jack DeVries in branch 'main':
bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766)
https://github.com/python/cpython/commit/b6fe8572509b77d2002eaddf99d718e9b4835684
msg402845 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-09-29 10:19
New changeset d6b69f21d8ec4af47a9c79f3f50d20be3d0875fc by Łukasz Langa in branch '3.10':
[3.10] bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766) (GH-28613)
https://github.com/python/cpython/commit/d6b69f21d8ec4af47a9c79f3f50d20be3d0875fc
msg402847 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-09-29 10:56
New changeset 7bff4d396f20451f20977be3ce23a879c6bc3e46 by Łukasz Langa in branch '3.9':
[3.9] bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766) (GH-28614)
https://github.com/python/cpython/commit/7bff4d396f20451f20977be3ce23a879c6bc3e46
msg402848 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-09-29 10:58
Thanks for the fix, Jack! ✨ 🍰 ✨  

Since the change translated `zlib.error` to `tarfile.ReadError` which already has to be handled by user code, it's strictly decreasing the surface of necessary exception handling. So, treating this as a bug fix, I backported this to 3.9 and 3.10 as well.
History
Date User Action Args
2021-09-29 10:58:55lukasz.langasetstatus: open -> closed
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.7
messages: + msg402848

resolution: fixed
stage: patch review -> resolved
2021-09-29 10:56:18lukasz.langasetmessages: + msg402847
2021-09-29 10:19:45lukasz.langasetmessages: + msg402845
2021-09-29 09:42:37lukasz.langasetpull_requests: + pull_request26984
2021-09-29 09:35:13lukasz.langasetpull_requests: + pull_request26983
2021-09-29 09:25:52lukasz.langasetnosy: + lukasz.langa
messages: + msg402834
2021-08-20 18:44:22jvoisinsetmessages: + msg399989
2021-08-18 00:42:06jack__dsetmessages: + msg399811
2021-08-14 03:00:26jack__dsetkeywords: + patch
nosy: + jack__d

pull_requests: + pull_request26242
stage: patch review
2019-12-13 16:38:47jvoisinsetmessages: + msg358341
2019-12-13 16:34:20christian.heimessetnosy: + christian.heimes
messages: + msg358340
2019-12-13 16:02:19xtreaksetnosy: + ethan.furman
2019-12-13 16:00:48jvoisincreate