classification
Title: The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, barry, lars.gustaebel, martin.panter, mvo, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-01-13 08:55 by mvo, last changed 2018-12-16 01:34 by martin.panter.

Files
File name Uploaded Description Edit
tarfile-extract-crash.py mvo, 2015-01-13 08:55
possible-fix-37688.diff mvo, 2015-01-13 09:03 review
possible-fix-23228-with-test.diff mvo, 2015-01-13 09:10 review
possible-fix-23228-with-test2.diff mvo, 2015-01-19 09:27 review
windowserror.diff lars.gustaebel, 2016-05-08 15:16 review
Messages (9)
msg233907 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-13 08:55
The tarfile.makelink() code crashes with a maximum recursion error when it unpacks a tarfile that contains a symlink into a directory that already contains this symlink.

Attached is a standalone testcase (that probably better explains whats going on :) and a possible fix.
msg233910 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-13 09:03
A possible fix that works with the previous testcase for this bug. It does not break a tarfile tests.
msg233914 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-13 09:10
This patch contains a regression test as well.
msg233918 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2015-01-13 09:36
Perhaps I'm missing something here, but it doesn't seem to be a problem with valid links. Only invalid symlinks are causing this issue.
msg234309 - (view) Author: Michael Vogt (mvo) * Date: 2015-01-19 09:27
Thanks SilentGhost for your feedback and sorry for my slow reply.

I looked into this some more and attached a updated patch with a more complete test. It also covers a crash now that happens when there is a symlink cycle in the tar and on disk. 

My fix is to remove existing links before unpack and replace them with the link in the tar. This is what gnu tar is also doing (according to http://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html). However this seems to be a behavior change of what the tarfile module has traditionally been done so feel free to reject it, I'm open to alternative ideas of course (even though I feel like having the same behavior as gnu tar is desirable).

Thanks,
 Michael
msg264808 - (view) Author: Michael Vogt (mvo) * Date: 2016-05-04 12:39
Anything I can do to help moving this issue forward?
msg265146 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2016-05-08 15:16
TarFile.makelink() has a fallback mode in case the platform does not support links. Instead of a symlink or a hardlink it extracts the file it points to as long as it exists in the current archive.

More precisely, makelink() calls os.symlink() and if one of the exceptions in the symlink_exception tuple is raised, it goes into fallback mode. r80944 introduced a regression because it replaced the WindowsError in symlink_exception with an OSError which is much less specific than a WindowsError. Since that change, the fallback is used everytime an OSError occurs, in Michael's case it is a FileExistsError, because the symlink is already there.

The attached patch restores the old behavior. This might not be what you wanted, Michael, but at least, tarfile no longer crashes.
msg265147 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2016-05-08 15:27
I suck :-) It is hg revision bb94f6222fef.
msg331911 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-12-16 01:34
The problem with WindowsError should only exist in 3.4+. 2.7 doesn’t support creating symlinks on Windows.

Michael’s fix is the same as already done in 2.7 for Issue 10761 and (part of) Issue 12088. However I’m not sure that is the best approach for a bug fix. Also see Issue 19974 proposing to replace existing directory entries in all cases, including replacing empty subdirectories, and not just when extracting symlinks.

I suspect Michael has only fixed the recursive loop on Unix. What happens if an exception is raised because symlinks are not supported (e.g. Windows)? Possible test case:

data = BytesIO()
writer = tarfile.TarFile(fileobj=data, mode='w')
selflink = tarfile.TarInfo('self')
selflink.size = 0
selflink.type = tarfile.SYMTYPE
selflink.linkname = selflink.name
writer.addfile(selflink)
writer.close()
data.seek(0)
tarfile.TarFile(fileobj=data).extractall()
History
Date User Action Args
2018-12-16 01:45:59martin.panterlinkissue35483 superseder
2018-12-16 01:34:29martin.pantersetnosy: + martin.panter
messages: + msg331911
2018-07-11 07:43:11serhiy.storchakasettype: crash -> behavior
2016-05-08 15:27:16lars.gustaebelsetmessages: + msg265147
2016-05-08 15:16:30lars.gustaebelsetfiles: + windowserror.diff

messages: + msg265146
2016-05-04 12:54:59ppperrysettitle: Crashes when tarfile contains a symlink and unpack directory contain it too -> The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too
2016-05-04 12:39:01mvosetmessages: + msg264808
2015-01-19 09:27:20mvosetfiles: + possible-fix-23228-with-test2.diff

messages: + msg234309
2015-01-13 15:07:44barrysetnosy: + barry
2015-01-13 10:04:57serhiy.storchakasetnosy: + lars.gustaebel, serhiy.storchaka
stage: patch review

versions: + Python 2.7, Python 3.4, Python 3.5
2015-01-13 09:36:36SilentGhostsetnosy: + SilentGhost
messages: + msg233918
2015-01-13 09:10:57mvosetfiles: + possible-fix-23228-with-test.diff

messages: + msg233914
2015-01-13 09:03:16mvosetfiles: + possible-fix-37688.diff
keywords: + patch
messages: + msg233910
2015-01-13 08:55:02mvocreate