classification
Title: 'tarfile.StreamError: seeking backwards is not allowed' when extract symlink
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: Jeffrey.Kintscher, adunand, andrew.garner, catlee, lars.gustaebel, mdk, miss-islington, serhiy.storchaka, taleinat
Priority: normal Keywords: patch

Created on 2011-08-20 22:57 by adunand, last changed 2020-12-18 18:53 by mdk. This issue is now closed.

Files
File name Uploaded Description Edit
test.py catlee, 2020-05-17 13:38
Pull Requests
URL Status Linked Edit
PR 13217 closed Jeffrey.Kintscher, 2019-05-09 09:55
PR 20972 closed catlee, 2020-06-18 22:39
PR 21409 merged mdk, 2020-07-09 09:28
PR 23508 merged miss-islington, 2020-11-25 09:23
PR 23509 merged miss-islington, 2020-11-25 09:23
Messages (13)
msg142580 - (view) Author: Aurélien Dunand (adunand) Date: 2011-08-20 22:57
When you extractall a tarball containing a symlink in stream mode ('r|'), an Exception happens:

Traceback (most recent call last):
    File "./test_extractall_stream_symlink.py", line 26, in <module>
    tar.extractall(path=destdir)
    File "/usr/lib/python3.2/tarfile.py", line 2134, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
    File "/usr/lib/python3.2/tarfile.py", line 2173, in extract
    set_attrs=set_attrs)
    File "/usr/lib/python3.2/tarfile.py", line 2249, in _extract_member
    self.makefile(tarinfo, targetpath)
    File "/usr/lib/python3.2/tarfile.py", line 2289, in makefile
    source.seek(tarinfo.offset_data)
    File "/usr/lib/python3.2/tarfile.py", line 553, in seek
    raise StreamError("seeking backwards is not allowed")
    tarfile.StreamError: seeking backwards is not allowed

You can reproduce the bug with this snippet of code:

TEMPDIR='/tmp/pyton_test'
os.mkdir(TEMPDIR)
tempdir = os.path.join(TEMPDIR, "testsymlinks")
temparchive = os.path.join(TEMPDIR, "testsymlinks.tar")
destdir = os.path.join(TEMPDIR, "extract")
os.mkdir(tempdir)
try:
    source_file = os.path.join(tempdir,'source')
    target_file = os.path.join(tempdir,'symlink')
    with open(source_file,'w') as f:
        f.write('something\n')
    os.symlink('source', target_file)
    tar = tarfile.open(temparchive,'w')
    tar.add(target_file, arcname=os.path.basename(target_file))
    tar.add(source_file, arcname=os.path.basename(source_file))
    tar.close()
    fo = open(temparchive, 'rb')
    tar = tarfile.open(fileobj=fo, mode='r|')
    try:
        tar.extractall(path=destdir)
    finally:
        tar.close()
finally:
    os.unlink(temparchive)
    shutil.rmtree(TEMPDIR)



If source_file is added before target_file, there is no Exception raised. But it still raised when you create the same tarball with GNU tar.
msg221577 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-25 21:55
ping.
msg221595 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-06-26 07:31
All works to me without exception in 2.7, 3.3 and 3.4.
msg223171 - (view) Author: Andrew Garner (andrew.garner) Date: 2014-07-16 04:53
This seems to be a similar to issue10761 where symlinks are not being overwritten by TarFile.extract but is only an issue in streaming mode and only in python3. To reproduce, attempt to extract a symlink from a tarfile opened with 'r|' and overwrite an existing file.

Here's a simple scripts that demonstrates this behavior adapted from Aurélien's. 

#!/usr/bin/python

import os
import shutil
import sys
import tempfile
import tarfile


def main():
    tmpdir = tempfile.mkdtemp()
    try:
        os.chdir(tmpdir)
        source = 'source'
        link = 'link'
        temparchive = 'issue12800'
        # create source
        with open(source, 'wb'):
            pass
        os.symlink(source, link)
        with tarfile.open(temparchive, 'w') as tar:
            tar.add(source, arcname=os.path.basename(source))
            tar.add(link, arcname=os.path.basename(link))

        with open(temparchive, 'rb') as fileobj:
            with tarfile.open(fileobj=fileobj, mode='r|') as tar:
                tar.extractall(path=tmpdir)
    finally:
        shutil.rmtree(tmpdir)

if __name__ == '__main__':
    sys.exit(main())


On python 3.3.2 I get the following results:

$ python3.3 issue12800.py
Traceback (most recent call last):
  File "issue12800.py", line 32, in <module>
    sys.exit(main())
  File "issue12800.py", line 27, in main
    tar.extractall(path=tmpdir)
  File "/usr/lib64/python3.3/tarfile.py", line 1984, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
  File "/usr/lib64/python3.3/tarfile.py", line 2023, in extract
    set_attrs=set_attrs)
  File "/usr/lib64/python3.3/tarfile.py", line 2100, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/usr/lib64/python3.3/tarfile.py", line 2181, in makelink
    os.symlink(tarinfo.linkname, targetpath)
FileExistsError: [Errno 17] File exists: '/tmp/tmpt0u1pn/link'

On python 3.4.1 I get the following results:

$ python3.4 issue12800.py
Traceback (most recent call last):
  File "/usr/lib64/python3.4/tarfile.py", line 2176, in makelink
    os.symlink(tarinfo.linkname, targetpath)
FileExistsError: [Errno 17] File exists: 'source' -> '/tmp/tmp3b96k5f0/link'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "issue12800.py", line 32, in <module>
    sys.exit(main())
  File "issue12800.py", line 27, in main
    tar.extractall(path=tmpdir)
  File "/usr/lib64/python3.4/tarfile.py", line 1979, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
  File "/usr/lib64/python3.4/tarfile.py", line 2018, in extract
    set_attrs=set_attrs)
  File "/usr/lib64/python3.4/tarfile.py", line 2095, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/usr/lib64/python3.4/tarfile.py", line 2187, in makelink
    targetpath)
  File "/usr/lib64/python3.4/tarfile.py", line 2087, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/lib64/python3.4/tarfile.py", line 2126, in makefile
    source.seek(tarinfo.offset_data)
  File "/usr/lib64/python3.4/tarfile.py", line 518, in seek
    raise StreamError("seeking backwards is not allowed")
tarfile.StreamError: seeking backwards is not allowed
msg341969 - (view) Author: Jeffrey Kintscher (Jeffrey.Kintscher) * Date: 2019-05-09 06:00
The problem is in TarFile.makelink() in Lib/tarfile.py. It calls os.symlink() to create the link, which fails because the link already exists and triggers the exception handler. The exception handler then tries to create the linked file under the assumption (per source code comments) that the link creation failed because the system doesn't support symbolic links. The file creation then fails because it requires seeking backwards in the archive.
msg369061 - (view) Author: Chris AtLee (catlee) * Date: 2020-05-16 18:39
Is there anything I can do to help get this landed? The PR in github works for me.
msg369119 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-05-17 13:23
Hi Chris, which exception did you got exactly? Was it caused by the r| mode or by a symlink (or file) already existing?
msg369120 - (view) Author: Chris AtLee (catlee) * Date: 2020-05-17 13:38
It's caused by the combination of the symlink existing, and having the tarfile opened in r| mode.

If I run the attached test file in a fresh directory, I get the following exception:

raceback (most recent call last):
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2227, in makelink
    os.symlink(tarinfo.linkname, targetpath)
FileExistsError: [Errno 17] File exists: 'message.txt' -> './symlink.txt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../test.py", line 12, in <module>
    tf.extractall()
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2024, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2065, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2145, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2237, in makelink
    self._extract_member(self._find_link_target(tarinfo),
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2137, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2176, in makefile
    source.seek(tarinfo.offset_data)
  File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 513, in seek
    raise StreamError("seeking backwards is not allowed")
tarfile.StreamError: seeking backwards is not allowed
msg373384 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-07-09 09:29
Strange fact, this was already fixed in 011525ee92eb1c13ad1a62d28725a840e28f8160 (which closes issue10761, nice spot Andrew) but was lost during a merge in 0d28a61d23:

$ git show 0d28a61d23
commit 0d28a61d233c02c458c8b4a25613be2f4979331e
Merge: ed3a303548 d7c9d9cdcd

$ git show 0d28a61d23:Lib/tarfile.py | grep unlink  # The merge commit does no longer contains the fix

$ git show ed3a303548:Lib/tarfile.py | grep unlink  # The "left" parent does not contains it neither

$ git show d7c9d9cdcd:Lib/tarfile.py | grep unlink  # The "right" one does contains it.
                    os.unlink(targetpath)
                        os.unlink(targetpath)

Stranger fact, the test was not lost during the merge, and still lives today (test_extractall_symlinks).

Happen that the current test is passing because it's in part erroneous, instead of trying to create a symlink on an existing one, it creates a symlink far far away:

(Pdb) p targetpath
'/home/mdk/clones/python/cpython/@test_648875_tmp-tardir/testsymlinks/home/mdk/clones/python/cpython/@test_648875_tmp-tardir/testsymlinks/symlink'

Aditionally it passes anway because tar.errorlevel equals 1, which means the error is logged but not raised.

With the following small patch:

--- a/Lib/test/test_tarfile.py
+++ b/Lib/test/test_tarfile.py
@@ -1339,10 +1339,10 @@ class WriteTest(WriteTestBase, unittest.TestCase):
                 f.write('something\n')
             os.symlink(source_file, target_file)
             with tarfile.open(temparchive, 'w') as tar:
-                tar.add(source_file)
-                tar.add(target_file)
+                tar.add(source_file, arcname="source")
+                tar.add(target_file, arcname="symlink")
             # Let's extract it to the location which contains the symlink
-            with tarfile.open(temparchive) as tar:
+            with tarfile.open(temparchive, errorlevel=2) as tar:
                 # this should not raise OSError: [Errno 17] File exists
                 try:
                     tar.extractall(path=tempdir)


the error is raised as expected: FileExistsError: [Errno 17] File exists: '/home/mdk/clones/python/cpython/@test_649794_tmpæ-tardir/testsymlinks/source' -> '/home/mdk/clones/python/cpython/@test_649794_tmpæ-tardir/testsymlinks/symlink'

I'm opening an PR to restore this as it was intended.
msg377173 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-09-19 15:50
See also another duplicate of this issue, issue40049.
msg381803 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2020-11-25 09:23
New changeset 4fedd7123eaf147edd55eabbbd72e0bcc8368e47 by Julien Palard in branch 'master':
bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409)
https://github.com/python/cpython/commit/4fedd7123eaf147edd55eabbbd72e0bcc8368e47
msg381807 - (view) Author: miss-islington (miss-islington) Date: 2020-11-25 09:53
New changeset 9d2c2a8e3b8fe18ee1568bfa4a419847b3e78575 by Miss Islington (bot) in branch '3.9':
bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409)
https://github.com/python/cpython/commit/9d2c2a8e3b8fe18ee1568bfa4a419847b3e78575
msg381810 - (view) Author: miss-islington (miss-islington) Date: 2020-11-25 10:01
New changeset bda2e68c8849e23899b3dad9e436c06303254943 by Miss Islington (bot) in branch '3.8':
bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409)
https://github.com/python/cpython/commit/bda2e68c8849e23899b3dad9e436c06303254943
History
Date User Action Args
2020-12-18 18:53:15mdksetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-11-25 10:01:12miss-islingtonsetmessages: + msg381810
2020-11-25 09:53:00miss-islingtonsetmessages: + msg381807
2020-11-25 09:23:57miss-islingtonsetpull_requests: + pull_request22396
2020-11-25 09:23:50miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request22395
2020-11-25 09:23:26mdksetmessages: + msg381803
2020-09-19 15:50:07taleinatsetnosy: + taleinat

messages: + msg377173
versions: + Python 3.9, Python 3.10
2020-07-09 09:29:15mdksetmessages: + msg373384
2020-07-09 09:28:29mdksetpull_requests: + pull_request20558
2020-06-18 22:39:55catleesetpull_requests: + pull_request20149
2020-05-17 13:38:02catleesetfiles: + test.py

messages: + msg369120
2020-05-17 13:23:15mdksetnosy: + mdk
messages: + msg369119
2020-05-16 18:39:10catleesetnosy: + catlee
messages: + msg369061
2019-05-09 09:55:14Jeffrey.Kintschersetkeywords: + patch
stage: patch review
pull_requests: + pull_request13128
2019-05-09 06:00:24Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher

messages: + msg341969
versions: + Python 3.7, Python 3.8, - Python 3.2, Python 3.3, Python 3.4
2019-04-26 19:45:04BreamoreBoysetnosy: - BreamoreBoy
2014-07-16 04:53:36andrew.garnersetnosy: + andrew.garner

messages: + msg223171
versions: + Python 3.3, Python 3.4
2014-06-26 07:31:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg221595
2014-06-25 21:55:30BreamoreBoysetnosy: + BreamoreBoy
messages: + msg221577
2011-09-12 07:13:36lars.gustaebelsetassignee: lars.gustaebel
2011-08-21 00:07:46ned.deilysetnosy: + lars.gustaebel
2011-08-20 22:57:39adunandcreate