This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile.extractall fails to overwrite symlinks
Type: performance Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.8, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Scott.Leerssen, ezio.melotti, lars.gustaebel, orsenthil, pitrou, python-dev, santoso.wijaya, srid, vstinner
Priority: normal Keywords:

Created on 2010-12-23 00:17 by srid, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (14)
msg124523 - (view) Author: Sridhar Ratnakumar (srid) Date: 2010-12-23 00:17
tarfile.extractall overwrites normal files and directories, yet it fails to overwrite symlinks:

  [..]
    tf.extractall()
  File "/opt/ActivePython-2.7/lib/python2.7/tarfile.py", line 2046, in extractall
    self.extract(tarinfo, path)
  File "/opt/ActivePython-2.7/lib/python2.7/tarfile.py", line 2083, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/opt/ActivePython-2.7/lib/python2.7/tarfile.py", line 2167, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/opt/ActivePython-2.7/lib/python2.7/tarfile.py", line 2243, in makelink
    os.symlink(tarinfo.linkname, targetpath)
OSError: [Errno 17] File exists

To reproduce, use a .tar.gz file containing relative (i.e., in the same directory) symlinks.

Perhaps it should delete `targetpath` before attempting to create a symlink.
msg134502 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-04-26 21:20
I just hit the same issue.  This seems to work:


Modified:Lib/tarfile.py
===================================================================
---Lib/tarfile.py	2011-04-26 20:36:33 UTC (rev 49502)
+++Lib/tarfile.py	2011-04-26 21:01:24 UTC (rev 49503)
@@ -2239,6 +2239,8 @@
        if hasattr(os, "symlink") and hasattr(os, "link"):
            # For systems that support symbolic and hard links.
            if tarinfo.issym():
+                if os.path.exists(targetpath):
+                    os.unlink(targetpath)
                os.symlink(tarinfo.linkname, targetpath)
            else:
                # See extract().
msg134519 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-26 23:23
Scott- which platform did you observe this? I can't reproduce this on the 2.7 code on Linux.
msg134555 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-04-27 12:24
It happens on RedHat and CentOS 5, but I suspect it would happen on any Unix variant.  Here's a test that exacerbates the issue:


#
# test_tarfile.py
#
# Description:
#       tests for python tarfile module
#
# $Id$
#

import os
import shutil
import tarfile
import tempfile

def test_tar_overwrites_symlink():
    d = tempfile.mkdtemp()
    try:
        new_dir_name = os.path.join(d, 'newdir')
        archive_name = os.path.join(d, 'newdir.tar')
        os.mkdir(new_dir_name)
        source_file_name = os.path.join(new_dir_name, 'source')
        target_link_name = os.path.join(new_dir_name, 'symlink')
        with open(source_file_name, 'w') as f:
            f.write('something\n')
        os.symlink(source_file_name, target_link_name)
        with tarfile.open(archive_name, 'w') as tar:
            for f in [source_file_name, target_link_name]:
                tar.add(f, arcname=os.path.basename(f))
        # this should not raise OSError: [Errno 17] File exists
        with tarfile.open(archive_name, 'r') as tar:
            tar.extractall(path=new_dir_name)
    finally:
        shutil.rmtree(d)
msg134657 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-28 07:32
New changeset 0c8bc3a0130a by Senthil Kumaran in branch '2.7':
Fix closes  issue10761: tarfile.extractall failure  when symlinked files are present.
http://hg.python.org/cpython/rev/0c8bc3a0130a
msg134658 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-28 07:35
I had tried/tested against 3.x branch and did not find the problem. Later realized that it was only again 2.7. Pushed in the changes and the tests. I shall the tests only in 3.x codeline.
msg134782 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-29 16:21
Senthil, Windows buildbots on 3.1, 3.2 and 3.x show test failures.
See e.g. http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.1/builds/1780/steps/test/logs/stdio
msg134816 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-29 22:12
New changeset 2665a28643b8 by Senthil Kumaran in branch 'default':
Wrap the correct test with the skip decorator for the issue10761.
http://hg.python.org/cpython/rev/2665a28643b8
msg134817 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-29 22:14
I had wrapped skipUnless decorator for the wrong test (test_extractall
instead of test_extractall_symlinks) in the 3.x code.  Corrected it
and waiting for next bb reports.

Thank you.
msg134828 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-30 01:17
buildbots are green again.
msg135924 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-05-13 16:34
It turns out that my fix was at least one byte short of complete.  If the target pathname is a broken symlink, os.path.exists() returns False, and the OSError is raised.  I should have used os.path.lexists().  Also, I believe the same problem exists for the hardlink case a few lines below.
msg135925 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-05-13 16:57
here is a diff of a better fix based on the previous patch:


Index: tarfile.py
===================================================================
--- tarfile.py  (revision 49758)
+++ tarfile.py  (working copy)
@@ -2239,12 +2239,14 @@
         if hasattr(os, "symlink") and hasattr(os, "link"):
             # For systems that support symbolic and hard links.
             if tarinfo.issym():
-                if os.path.exists(targetpath):
+                if os.path.lexists(targetpath):
                     os.unlink(targetpath)
                 os.symlink(tarinfo.linkname, targetpath)
             else:
                 # See extract().
                 if os.path.exists(tarinfo._link_target):
+                    if os.path.lexists(targetpath):
+                        os.unlink(targetpath)
                     os.link(tarinfo._link_target, targetpath)
                 else:
                     self._extract_member(self._find_link_target(tarinfo), targetpath)
msg135926 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-05-13 16:58
tests that verify the bug/fix:


    def test_extractall_broken_symlinks(self):
        # Test if extractall works properly when tarfile contains symlinks
        tempdir = os.path.join(TEMPDIR, "testsymlinks")
        temparchive = os.path.join(TEMPDIR, "testsymlinks.tar")
        os.mkdir(tempdir)
        try:
            source_file = os.path.join(tempdir,'source')
            target_file = os.path.join(tempdir,'symlink')
            with open(source_file,'w') as f:
                f.write('something\n')
            os.symlink(source_file, target_file)
            tar = tarfile.open(temparchive,'w')
            tar.add(target_file, arcname=os.path.basename(target_file))
            tar.close()
            # remove the real file
            os.unlink(source_file)
            # Let's extract it to the location which contains the symlink
            tar = tarfile.open(temparchive,'r')
            # this should not raise OSError: [Errno 17] File exists
            try:
                tar.extractall(path=tempdir)
            except OSError:
                self.fail("extractall failed with broken symlinked files")
            finally:
                tar.close()
        finally:
            os.unlink(temparchive)
            shutil.rmtree(TEMPDIR)

    def test_extractall_hardlinks(self):
        # Test if extractall works properly when tarfile contains symlinks
        TEMPDIR = tempfile.mkdtemp()
        tempdir = os.path.join(TEMPDIR, "testsymlinks")
        temparchive = os.path.join(TEMPDIR, "testsymlinks.tar")
        os.mkdir(tempdir)
        try:
            source_file = os.path.join(tempdir,'source')
            target_file = os.path.join(tempdir,'symlink')
            with open(source_file,'w') as f:
                f.write('something\n')
            os.link(source_file, target_file)
            tar = tarfile.open(temparchive,'w')
            tar.add(source_file, arcname=os.path.basename(source_file))
            tar.add(target_file, arcname=os.path.basename(target_file))
            tar.close()
            # Let's extract it to the location which contains the symlink
            tar = tarfile.open(temparchive,'r')
            # this should not raise OSError: [Errno 17] File exists
            try:
                tar.extractall(path=tempdir)
            except OSError:
                self.fail("extractall failed with linked files")
            finally:
                tar.close()
        finally:
            os.unlink(temparchive)
            shutil.rmtree(TEMPDIR)
msg135936 - (view) Author: Scott Leerssen (Scott.Leerssen) Date: 2011-05-13 18:08
oops... I left some of my local edits in those tests.  be sure to fix the TEMPDIR use if you add these into the tarfile tests.
History
Date User Action Args
2022-04-11 14:57:10adminsetgithub: 54970
2017-11-19 13:33:04Chris Albrightsetnosy: + ezio.melotti, vstinner

type: behavior -> performance
components: + Unicode
versions: + Python 3.8
2011-05-13 18:08:36Scott.Leerssensetmessages: + msg135936
2011-05-13 16:58:16Scott.Leerssensetmessages: + msg135926
2011-05-13 16:57:08Scott.Leerssensetmessages: + msg135925
2011-05-13 16:34:54Scott.Leerssensetmessages: + msg135924
2011-04-30 01:17:03orsenthilsetstatus: open -> closed

messages: + msg134828
2011-04-29 22:14:40orsenthilsetmessages: + msg134817
2011-04-29 22:12:39python-devsetmessages: + msg134816
2011-04-29 16:21:31pitrousetstatus: closed -> open

nosy: + pitrou
messages: + msg134782

assignee: lars.gustaebel -> orsenthil
2011-04-28 07:35:29orsenthilsetmessages: + msg134658
2011-04-28 07:32:17python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg134657

resolution: fixed
stage: resolved
2011-04-27 12:24:41Scott.Leerssensetmessages: + msg134555
2011-04-27 00:11:55santoso.wijayasetnosy: + santoso.wijaya
2011-04-26 23:23:56orsenthilsetnosy: + orsenthil
messages: + msg134519
2011-04-26 21:20:13Scott.Leerssensetnosy: + Scott.Leerssen
messages: + msg134502
2011-01-04 08:25:53lars.gustaebelsetassignee: lars.gustaebel

nosy: + lars.gustaebel
2010-12-23 00:17:11sridcreate