Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too #67417

Open
mvo mannequin opened this issue Jan 13, 2015 · 9 comments
Open
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mvo
Copy link
Mannequin

mvo mannequin commented Jan 13, 2015

BPO 23228
Nosy @warsaw, @gustaebel, @vadmium, @serhiy-storchaka
Files
  • tarfile-extract-crash.py
  • possible-fix-37688.diff
  • possible-fix-23228-with-test.diff
  • possible-fix-23228-with-test2.diff
  • windowserror.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-01-13.08:55:02.134>
    labels = ['type-bug', 'library']
    title = 'The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too'
    updated_at = <Date 2018-12-16.01:34:29.727>
    user = 'https://bugs.python.org/mvo'

    bugs.python.org fields:

    activity = <Date 2018-12-16.01:34:29.727>
    actor = 'martin.panter'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-01-13.08:55:02.134>
    creator = 'mvo'
    dependencies = []
    files = ['37688', '37689', '37690', '37774', '42780']
    hgrepos = []
    issue_num = 23228
    keywords = ['patch']
    message_count = 9.0
    messages = ['233907', '233910', '233914', '233918', '234309', '264808', '265146', '265147', '331911']
    nosy_count = 6.0
    nosy_names = ['barry', 'lars.gustaebel', 'mvo', 'SilentGhost', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23228'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 13, 2015

    The tarfile.makelink() code crashes with a maximum recursion error when it unpacks a tarfile that contains a symlink into a directory that already contains this symlink.

    Attached is a standalone testcase (that probably better explains whats going on :) and a possible fix.

    @mvo mvo mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Jan 13, 2015
    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 13, 2015

    A possible fix that works with the previous testcase for this bug. It does not break a tarfile tests.

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 13, 2015

    This patch contains a regression test as well.

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Jan 13, 2015

    Perhaps I'm missing something here, but it doesn't seem to be a problem with valid links. Only invalid symlinks are causing this issue.

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented Jan 19, 2015

    Thanks SilentGhost for your feedback and sorry for my slow reply.

    I looked into this some more and attached a updated patch with a more complete test. It also covers a crash now that happens when there is a symlink cycle in the tar and on disk.

    My fix is to remove existing links before unpack and replace them with the link in the tar. This is what gnu tar is also doing (according to http://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html). However this seems to be a behavior change of what the tarfile module has traditionally been done so feel free to reject it, I'm open to alternative ideas of course (even though I feel like having the same behavior as gnu tar is desirable).

    Thanks,
    Michael

    @mvo
    Copy link
    Mannequin Author

    mvo mannequin commented May 4, 2016

    Anything I can do to help moving this issue forward?

    @pppery pppery mannequin changed the title Crashes when tarfile contains a symlink and unpack directory contain it too The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too May 4, 2016
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented May 8, 2016

    TarFile.makelink() has a fallback mode in case the platform does not support links. Instead of a symlink or a hardlink it extracts the file it points to as long as it exists in the current archive.

    More precisely, makelink() calls os.symlink() and if one of the exceptions in the symlink_exception tuple is raised, it goes into fallback mode. r80944 introduced a regression because it replaced the WindowsError in symlink_exception with an OSError which is much less specific than a WindowsError. Since that change, the fallback is used everytime an OSError occurs, in Michael's case it is a FileExistsError, because the symlink is already there.

    The attached patch restores the old behavior. This might not be what you wanted, Michael, but at least, tarfile no longer crashes.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented May 8, 2016

    I suck :-) It is hg revision bb94f6222fef.

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 11, 2018
    @vadmium
    Copy link
    Member

    vadmium commented Dec 16, 2018

    The problem with WindowsError should only exist in 3.4+. 2.7 doesn’t support creating symlinks on Windows.

    Michael’s fix is the same as already done in 2.7 for bpo-10761 and (part of) bpo-12088. However I’m not sure that is the best approach for a bug fix. Also see bpo-19974 proposing to replace existing directory entries in all cases, including replacing empty subdirectories, and not just when extracting symlinks.

    I suspect Michael has only fixed the recursive loop on Unix. What happens if an exception is raised because symlinks are not supported (e.g. Windows)? Possible test case:

    data = BytesIO()
    writer = tarfile.TarFile(fileobj=data, mode='w')
    selflink = tarfile.TarInfo('self')
    selflink.size = 0
    selflink.type = tarfile.SYMTYPE
    selflink.linkname = selflink.name
    writer.addfile(selflink)
    writer.close()
    data.seek(0)
    tarfile.TarFile(fileobj=data).extractall()

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    2 participants