classification
Title: tarfile cannot extract from stdin
Type: crash Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jonathan Hsu, Manjusaka, dtamuc, python-dev
Priority: normal Keywords: patch

Created on 2020-03-23 15:54 by dtamuc, last changed 2020-03-27 21:04 by dtamuc.

Files
File name Uploaded Description Edit
test.tar dtamuc, 2020-03-26 17:46
Pull Requests
URL Status Linked Edit
PR 19187 open python-dev, 2020-03-27 01:35
Messages (5)
msg364860 - (view) Author: Danijel (dtamuc) Date: 2020-03-23 15:54
Hi,

I have the following code:

```
import tarfile
import sys

tar = tarfile.open(fileobj=sys.stdin.buffer, mode='r|*')
tar.extractall("tarout")
tar.close()
```

then doing the following on a debian 10 system:

```
$ python -m tarfile -c git.tar /usr/share/doc/git
$ python -V
Python 3.8.1
$ cat git.tar | python foo.py
$ cat git.tar | python foo.py
Traceback (most recent call last):
  File "foo.py", line 5, in <module>
    tar.extractall("tarout")
  File "/home/danielt/miniconda3/lib/python3.8/tarfile.py", line 2026, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/home/danielt/miniconda3/lib/python3.8/tarfile.py", line 2067, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/home/danielt/miniconda3/lib/python3.8/tarfile.py", line 2139, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/home/danielt/miniconda3/lib/python3.8/tarfile.py", line 2178, in makefile
    source.seek(tarinfo.offset_data)
  File "/home/danielt/miniconda3/lib/python3.8/tarfile.py", line 513, in seek
    raise StreamError("seeking backwards is not allowed")
tarfile.StreamError: seeking backwards is not allowed
```

The second extraction trys to seek, although the mode is 'r|*'.


For reference if I remove ".buffer" from the code above, I can run
it with python2 without problems:

```
$ cat foo2.py
import tarfile
import sys

tar = tarfile.open(fileobj=sys.stdin, mode='r|*')
tar.extractall("tarout")
tar.close()

$ cat git.tar | python2 foo2.py
$ cat git.tar | python2 foo2.py
$ cat git.tar | python2 foo2.py
$ cat git.tar | python2 foo2.py
$ cat git.tar | python2 foo2.py
```
msg365093 - (view) Author: Manjusaka (Manjusaka) * Date: 2020-03-26 16:38
Hello
 
I can't reproduce this issue on my Laptop from 3.8.1 to 3.9.0a4

I think maybe it depends on the file you use

would you mind to upload the file with the problem?
msg365102 - (view) Author: Danijel (dtamuc) Date: 2020-03-26 17:46
Hi,

well, it says entity too large. I've attached a smaller one, that throws a similar but slightly different error. (Note: only on the _second_ extraction, it looks like problems with symlinks)

You can find larger ones here:

https://data.rbfh.de/issue40049/

The typescript*.txt are showing a shell session with two different python versions. (3.4.2 and 3.8.2)
msg365128 - (view) Author: Jonathan Hsu (Jonathan Hsu) * Date: 2020-03-27 01:49
This is caused when tarfile tries to write a symlink that already exists. Any exceptions to os.symlink() as handled as if the platform doesn't support symlinks, so it scans the entire tar to try and find the linked files. When it resumes extraction, it needs to do a negative seek to pick up where it left off, which causes the exception.

I've reproduced the error on both Windows 10 and Ubuntu running on WSL. Python 2.7 handled this situation by checking if the symlink exists, but it looks like the entire tarfile library was replaced with an alternate implementation that doesn't check if the symlink exists. I've created a pull request to address this issue.
msg365192 - (view) Author: Danijel (dtamuc) Date: 2020-03-27 21:04
For me, this patch solves my problems. Thank you.
History
Date User Action Args
2020-03-27 21:04:33dtamucsetmessages: + msg365192
2020-03-27 01:49:28Jonathan Hsusetnosy: + Jonathan Hsu
messages: + msg365128
2020-03-27 01:35:20python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request18546
stage: patch review
2020-03-26 17:46:41dtamucsetfiles: + test.tar

messages: + msg365102
2020-03-26 16:38:52Manjusakasetnosy: + Manjusaka
messages: + msg365093
2020-03-24 13:44:32dtamucsettype: crash
2020-03-23 15:54:18dtamuccreate