Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile doesn't overwrite symlink by directory #64173

Open
antonymayi mannequin opened this issue Dec 13, 2013 · 20 comments
Open

tarfile doesn't overwrite symlink by directory #64173

antonymayi mannequin opened this issue Dec 13, 2013 · 20 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@antonymayi
Copy link
Mannequin

antonymayi mannequin commented Dec 13, 2013

BPO 19974
Nosy @jcea, @gustaebel, @vstinner, @vadmium, @serhiy-storchaka, @vajrasky, @antonymayi
PRs
  • bpo-19974: Make extractall method of tarfile overwrites directory sym… #11445
  • bpo-19974: Make extractall method of tarfile overwrites directory sym… #11445
  • bpo-19974: Make extractall method of tarfile overwrites directory sym… #11445
  • bpo-19974: Make extractall method of tarfile overwrites directory sym… #11445
  • Files
  • fix_tarfile_overwrites_symlink.patch
  • fix_tarfile_overwrites_symlink_v2.patch
  • fix_tarfile_overwrites_symlink_v3.patch
  • fix_tarfile_overwrites_symlink_v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = None
    created_at = <Date 2013-12-13.11:59:08.676>
    labels = ['type-bug', 'library']
    title = "tarfile doesn't overwrite symlink by directory"
    updated_at = <Date 2019-01-06.11:30:53.374>
    user = 'https://github.com/antonymayi'

    bugs.python.org fields:

    activity = <Date 2019-01-06.11:30:53.374>
    actor = 'vajrasky'
    assignee = 'serhiy.storchaka'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2013-12-13.11:59:08.676>
    creator = 'antonymayi'
    dependencies = []
    files = ['33129', '33147', '33548', '34152']
    hgrepos = []
    issue_num = 19974
    keywords = ['patch']
    message_count = 20.0
    messages = ['206066', '206173', '206234', '208315', '208480', '208482', '210803', '211704', '211719', '331912', '331952', '332003', '333093', '333095', '333096', '333098', '333099', '333102', '333103', '333104']
    nosy_count = 8.0
    nosy_names = ['jcea', 'lars.gustaebel', 'vstinner', 'martin.panter', 'serhiy.storchaka', 'vajrasky', 'antonymayi', 'drpotato']
    pr_nums = ['11445', '11445', '11445', '11445']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19974'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @antonymayi
    Copy link
    Mannequin Author

    antonymayi mannequin commented Dec 13, 2013

    tarfile.py compared to GNU tar doesn't overwrite existing symlink with directory of same name if such directory exists in the extracted tarball.

    @antonymayi antonymayi mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 13, 2013
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Dec 14, 2013

    Here is the preliminary path. It works and tested on Linux. I'll check the behaviour on Windows later.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Dec 15, 2013

    Here is the patch that works both on Windows and Linux.

    @drpotato
    Copy link
    Mannequin

    drpotato mannequin commented Jan 17, 2014

    Testing passed on OSX Mavericks.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 19, 2014

    Ah, thanks for the review, Serhiy. My bad. There is no underlying bug of tar. I was confused by the behaviour of tar which is converting the absolute path to relative path.

    So, adding '/home/user/dir/file' to tar when you are in '/home/user/dir' then extracting it in the same place, you'll get:
    '/home/user/dir/home/user/dir/file'.

    I thought it was a bug. But this is what it is supposed to be. I see that tarfile module is mimicking GNU tar behaviour.

    This is the updated patch.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 19, 2014

    Serhiy commented, "I think we should remove targetpath in all cases. Not only when softlink is
    extracted."

    I already did that in my latest patch but I am a little bit wary of this behaviour.

    @serhiy-storchaka serhiy-storchaka self-assigned this Feb 9, 2014
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Feb 10, 2014

    Yeah, you are right, Serhiy. I check the behaviour of GNU tar command line. It always replaces the target no matter what kind of file source and target are.

    @serhiy-storchaka
    Copy link
    Member

    There are several issues with last patch.

    • Fails to extract tarfiles containing the ./ directory (very common case).

    • And even more common case -- fails to extract non-empty directory.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Feb 20, 2014

    Here is the preliminary patch to address Serhiy's concern. I added some regression tests as well. Give me a time to think how to refactor the code (especially the test).

    @vadmium
    Copy link
    Member

    vadmium commented Dec 16, 2018

    I’m not sure if this should be considered a bug fix, but if it goes into 2.7 it would overlap with bpo-10761 and bpo-12088. In 2.7 existing directory entries (including broken symlinks, but not including subdirectories) may be replaced by symbolic and hard links.

    @vstinner
    Copy link
    Member

    Martin: you review a patch written 4 years ago at https://bugs.python.org/review/19974/diff/11106/Lib/tarfile.py

    Oh wow, I didn't know that Rietveld was still working :-D It's maybe time to convert the old patch to a proper pull request on GitHub, no? :-)

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Dec 17, 2018

    Martin Panter, thank you for reviewing my patch. Let me rework it. It has been a while (4 years!!!).

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    Martin Panter,

    I have modernized the patch.

    About your suggestion:

    1. "import errno" -> Yes, this is unnecessary. I have removed it.
    2. Use os.path.lexists instead of os.path.islink for broken symlink -> The thing is os.path.islink returns True also for broken symlink. I could not find a case where these methods return different result.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    "I could not find a case where these methods return different result." -> This is wrong. My mistake. os.lexists returns True for non symbolic link file while os.islink returns False.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    Sorry, Martin. I just understood what you suggested. I thought you were saying lexists to replace islink, but it is to replace the complex if condition. Let me work on it.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 6, 2019

    About “lexists”, I meant using it instead of “os.path.exits” (not “islink”). On Linux:

    >>> targetpath = 'target'
    >>> os.symlink('nonexistant', dst=targetpath)  # Make a broken symlink
    >>> os.system('ls -l')
    total 0
    lrwxrwxrwx 1 vadmium vadmium 11 Jan  6 09:28 target -> nonexistant
    0
    >>> os.path.exists(targetpath)  # Doesn't register broken symlinks
    False
    >>> os.path.lexists(targetpath)  # Does register it
    True

    Did you try extracting a tar file over a broken link? (I haven’t tried your code; I’m just going on theory.)

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    After experimenting with lexists, I don't think I can simplify it with lexists.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    I have added test case for broken symlinks. I am not sure whether this is necessary or not. But anyway I have added it.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    Martin, thank you for your advice. I have added additional two test cases for broken symlink case.

    1. broken symlink overwrites ordinary file,
    2. ordinary file overwrites broken symlink.

    I hope that is enough. Let me know if you have another concern.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Jan 6, 2019

    Correction:

    1. broken symlink overwrites valid symlink,
    2. ordinary file overwrites broken symlink.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    3 participants