This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile doesn't overwrite symlink by directory
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: antonymayi, drpotato, jcea, lars.gustaebel, martin.panter, serhiy.storchaka, vajrasky, vstinner
Priority: normal Keywords: patch

Created on 2013-12-13 11:59 by antonymayi, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
fix_tarfile_overwrites_symlink.patch vajrasky, 2013-12-14 09:20 review
fix_tarfile_overwrites_symlink_v2.patch vajrasky, 2013-12-15 14:31 review
fix_tarfile_overwrites_symlink_v3.patch vajrasky, 2014-01-19 14:39 review
fix_tarfile_overwrites_symlink_v4.patch vajrasky, 2014-02-20 11:15 review
Pull Requests
URL Status Linked Edit
PR 11445 open vajrasky, 2019-01-06 08:54
PR 11445 open vajrasky, 2019-01-06 08:54
PR 11445 vajrasky, 2019-01-06 08:54
PR 11445 open vajrasky, 2019-01-06 08:54
Messages (20)
msg206066 - (view) Author: Antony Mayi (antonymayi) Date: 2013-12-13 11:59
tarfile.py compared to GNU tar doesn't overwrite existing symlink with directory of same name if such directory exists in the extracted tarball.
msg206173 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-14 08:58
Here is the preliminary path. It works and tested on Linux. I'll check the behaviour on Windows later.
msg206234 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-15 14:31
Here is the patch that works both on Windows and Linux.
msg208315 - (view) Author: Chris Morgan (drpotato) Date: 2014-01-17 00:14
Testing passed on OSX Mavericks.
msg208480 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-01-19 14:39
Ah, thanks for the review, Serhiy. My bad. There is no underlying bug of tar. I was confused by the behaviour of tar which is converting the absolute path to relative path.

So, adding '/home/user/dir/file' to tar when you are in '/home/user/dir' then extracting it in the same place, you'll get:
'/home/user/dir/home/user/dir/file'.

I thought it was a bug. But this is what it is supposed to be. I see that tarfile module is mimicking GNU tar behaviour.

This is the updated patch.
msg208482 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-01-19 15:53
Serhiy commented, "I think we should remove targetpath in all cases. Not only when softlink is
extracted."

I already did that in my latest patch but I am a little bit wary of this behaviour.
msg210803 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-02-10 07:02
Yeah, you are right, Serhiy. I check the behaviour of GNU tar command line. It always replaces the target no matter what kind of file source and target are.
msg211704 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-20 07:03
There are several issues with last patch.

* Fails to extract tarfiles containing the ./ directory (very common case).

* And even more common case -- fails to extract non-empty directory.
msg211719 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-02-20 11:15
Here is the preliminary patch to address Serhiy's concern. I added some regression tests as well. Give me a time to think how to refactor the code (especially the test).
msg331912 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-12-16 01:40
I’m not sure if this should be considered a bug fix, but if it goes into 2.7 it would overlap with Issue 10761 and Issue 12088. In 2.7 existing directory entries (including broken symlinks, but not including subdirectories) may be replaced by symbolic and hard links.
msg331952 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-17 08:45
Martin: you review a patch written 4 years ago at https://bugs.python.org/review/19974/diff/11106/Lib/tarfile.py

Oh wow, I didn't know that Rietveld was still working :-D It's maybe time to convert the old patch to a proper pull request on GitHub, no? :-)
msg332003 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2018-12-17 13:47
Martin Panter, thank you for reviewing my patch. Let me rework it. It has been a while (4 years!!!).
msg333093 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 08:57
Martin Panter,

I have modernized the patch.

About your suggestion:
1. "import errno" -> Yes, this is unnecessary. I have removed it.
2. Use os.path.lexists instead of os.path.islink for broken symlink -> The thing is os.path.islink returns True also for broken symlink. I could not find a case where these methods return different result.
msg333095 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 09:30
"I could not find a case where these methods return different result." -> This is wrong. My mistake. os.lexists returns True for non symbolic link file while os.islink returns False.
msg333096 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 09:36
Sorry, Martin. I just understood what you suggested. I thought you were saying lexists to replace islink, but it is to replace the complex if condition. Let me work on it.
msg333098 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-01-06 09:44
About “lexists”, I meant using it instead of “os.path.exits” (not “islink”). On Linux:

>>> targetpath = 'target'
>>> os.symlink('nonexistant', dst=targetpath)  # Make a broken symlink
>>> os.system('ls -l')
total 0
lrwxrwxrwx 1 vadmium vadmium 11 Jan  6 09:28 target -> nonexistant
0
>>> os.path.exists(targetpath)  # Doesn't register broken symlinks
False
>>> os.path.lexists(targetpath)  # Does register it
True

Did you try extracting a tar file over a broken link? (I haven’t tried your code; I’m just going on theory.)
msg333099 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 09:48
After experimenting with lexists, I don't think I can simplify it with lexists.
msg333102 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 11:16
I have added test case for broken symlinks. I am not sure whether this is necessary or not. But anyway I have added it.
msg333103 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 11:28
Martin, thank you for your advice. I have added additional two test cases for broken symlink case.
1. broken symlink overwrites ordinary file,
2. ordinary file overwrites broken symlink.

I hope that is enough. Let me know if you have another concern.
msg333104 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2019-01-06 11:30
Correction:
1. broken symlink overwrites valid symlink,
2. ordinary file overwrites broken symlink.
History
Date User Action Args
2022-04-11 14:57:55adminsetgithub: 64173
2019-01-06 11:30:53vajraskysetmessages: + msg333104
2019-01-06 11:28:20vajraskysetmessages: + msg333103
2019-01-06 11:16:33vajraskysetmessages: + msg333102
2019-01-06 09:48:37vajraskysetmessages: + msg333099
2019-01-06 09:44:05martin.pantersetmessages: + msg333098
2019-01-06 09:36:02vajraskysetmessages: + msg333096
2019-01-06 09:30:39vajraskysetmessages: + msg333095
2019-01-06 08:57:32vajraskysetmessages: + msg333093
2019-01-06 08:55:11vajraskysetstage: needs patch -> patch review
pull_requests: + pull_request10897
2019-01-06 08:54:58vajraskysetstage: needs patch -> needs patch
pull_requests: + pull_request10898
2019-01-06 08:54:48vajraskysetstage: needs patch -> needs patch
pull_requests: + pull_request10896
2019-01-06 08:54:37vajraskysetstage: needs patch -> needs patch
pull_requests: + pull_request10895
2018-12-17 13:47:02vajraskysetmessages: + msg332003
2018-12-17 08:45:13vstinnersetmessages: + msg331952
2018-12-16 01:40:42martin.pantersetmessages: + msg331912
2017-03-09 22:38:48serhiy.storchakasetnosy: + vstinner
2014-06-12 00:56:06jceasetnosy: + jcea
2014-02-20 11:15:14vajraskysetfiles: + fix_tarfile_overwrites_symlink_v4.patch

messages: + msg211719
2014-02-20 07:03:07serhiy.storchakasetmessages: + msg211704
2014-02-10 09:23:18martin.pantersetnosy: + martin.panter
2014-02-10 07:02:15vajraskysetmessages: + msg210803
2014-02-09 20:58:34serhiy.storchakasetassignee: serhiy.storchaka
2014-01-19 15:53:37vajraskysetmessages: + msg208482
2014-01-19 14:39:23vajraskysetfiles: + fix_tarfile_overwrites_symlink_v3.patch

messages: + msg208480
2014-01-17 00:14:40drpotatosetnosy: + drpotato
messages: + msg208315
2014-01-15 16:59:18serhiy.storchakasetstage: patch review -> needs patch
2014-01-15 16:57:52serhiy.storchakasetstage: patch review
2013-12-15 14:31:01vajraskysetfiles: + fix_tarfile_overwrites_symlink_v2.patch

messages: + msg206234
2013-12-14 09:20:02vajraskysetfiles: + fix_tarfile_overwrites_symlink.patch
2013-12-14 09:19:52vajraskysetfiles: - fix_tarfile_overwrites_symlink.patch
2013-12-14 08:58:32vajraskysetfiles: + fix_tarfile_overwrites_symlink.patch

nosy: + vajrasky
messages: + msg206173

keywords: + patch
2013-12-13 21:33:07serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.4
2013-12-13 12:05:10vstinnersetnosy: + lars.gustaebel
2013-12-13 11:59:08antonymayicreate