Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile insecure pathname extraction #45385

Closed
gustaebel mannequin opened this issue Aug 28, 2007 · 20 comments
Closed

tarfile insecure pathname extraction #45385

gustaebel mannequin opened this issue Aug 28, 2007 · 20 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gustaebel
Copy link
Mannequin

gustaebel mannequin commented Aug 28, 2007

BPO 1044
Nosy @doko42, @gustaebel, @taleinat, @matejcik
Files
  • insecure_pathnames.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2007-08-30.20:28:31.686>
    created_at = <Date 2007-08-28.10:09:24.558>
    labels = ['type-bug', 'library']
    title = 'tarfile insecure pathname extraction'
    updated_at = <Date 2018-08-27.18:45:41.050>
    user = 'https://github.com/gustaebel'

    bugs.python.org fields:

    activity = <Date 2018-08-27.18:45:41.050>
    actor = 'taleinat'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = <Date 2007-08-30.20:28:31.686>
    closer = 'lars.gustaebel'
    components = ['Library (Lib)']
    creation = <Date 2007-08-28.10:09:24.558>
    creator = 'lars.gustaebel'
    dependencies = []
    files = ['8339']
    hgrepos = []
    issue_num = 1044
    keywords = ['patch']
    message_count = 6.0
    messages = ['55361', '55362', '55365', '55464', '55489', '55509']
    nosy_count = 4.0
    nosy_names = ['doko', 'lars.gustaebel', 'taleinat', 'matejcik']
    pr_nums = []
    priority = 'normal'
    resolution = 'works for me'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1044'
    versions = ['Python 2.6']

    @gustaebel
    Copy link
    Mannequin Author

    gustaebel mannequin commented Aug 28, 2007

    tarfile does not check pathnames or linknames on extraction. This can
    lead to data loss or attack scenarios when members with absolute
    pathnames or pathnames outside of the archive's scope overwrite or
    overlay existing files or directories.

    Example for a symlink attack against /etc/passwd:

    foo -> /etc
    foo/passwd

    @gustaebel gustaebel mannequin added the type-security A security issue label Aug 28, 2007
    @gustaebel gustaebel mannequin self-assigned this Aug 28, 2007
    @gustaebel gustaebel mannequin added the stdlib Python modules in the Lib dir label Aug 28, 2007
    @matejcik
    Copy link
    Mannequin

    matejcik mannequin commented Aug 28, 2007

    no change to extract() ?

    otherwise looks good to me. if you don't object, i am applying this to
    SUSE's python 2.5

    @gustaebel
    Copy link
    Mannequin Author

    gustaebel mannequin commented Aug 28, 2007

    In principle I do not object, but this is a preliminary patch. I am
    still not happy with the naming of the "check_paths" argument. Also, the
    patch was made against the trunk which means that it contains hunks with
    the new reStructuredText documentation. Please be patient.

    I do not change extract() because it has become more and more a
    low-level method over the years, that makes promises it cannot keep and
    should not be used at all. I try to discourage its use in the documentation.

    @gustaebel
    Copy link
    Mannequin Author

    gustaebel mannequin commented Aug 30, 2007

    After careful consideration and a private discussion with Martin I do no
    longer think that we have a security issue here. tarfile.py does nothing
    wrong, its behaviour conforms to the pax definition and pathname
    resolution guidelines in POSIX. There is no known or possible practical
    exploit.

    I update the documentation with a warning, that it might be dangerous to
    extract archives from untrusted sources. That is the only thing to be
    done IMO.

    @gustaebel gustaebel mannequin added type-bug An unexpected behavior, bug, or error and removed type-security A security issue labels Aug 30, 2007
    @matejcik
    Copy link
    Mannequin

    matejcik mannequin commented Aug 30, 2007

    if that can be considered "official stance", it's fine by me. feel free
    to close the bug.

    @gustaebel
    Copy link
    Mannequin Author

    gustaebel mannequin commented Aug 30, 2007

    I updated the documentation, r57764 (trunk) and r57765 (2.5).

    @gustaebel gustaebel mannequin closed this as completed Aug 30, 2007
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @sailinggalaxians
    Copy link

    Has anyone seen this?

    https://www.information-age.com/350000-open-source-projects-at-risk-due-to-15-year-old-vulnerability-123500024/

    If so, what's been done so far?
    Seems no official patches are released.
    I could be wrong though.

    @dgw
    Copy link

    dgw commented Sep 21, 2022

    Doing something you shouldn't with untrusted input isn't the language's fault.

    @FCLC
    Copy link

    FCLC commented Sep 22, 2022

    Also coming here after being pinged on this CVE, my case was sourced from https://www.bleepingcomputer.com/news/security/unpatched-15-year-old-python-bug-allows-code-execution-in-350k-projects/.

    Seems the concern is that even if the package is conformant to the spec, the spec allows for behavior that is exploitable?

    @jowagner
    Copy link

    jowagner commented Sep 22, 2022

    If you think this is not a security issue then call it a feature request. In this case, check_paths should be False for now, a future release can change the default to True, and it should be documented for when this change of behaviour is scheduled, e.g. v 3.12. (Probably, the temporary default should be None with special behaviour of performing the check but only printing a depreciated warning.)

    I'd argue that absolute paths should be ok if the target directory is '/'. We'd want an up-to-date restore utility to be able to restore a system backup that was created with an older version of the utility (assuming the old version puts absolute paths into the archive and the new version specifies '/' as the target directory).

    As to the naming of check_paths, something like confine_to_target_directory would be more specific as to what the option does. However, if pre-existing symlinks are accessed (the documentation does not say what extract*() do with existing entries) this naming could be misleading. Maybe this should be two args: reject_parent_paths and reject_absolute_paths. The latter should probably not default to True when the target path is /.

    A nice extension would be to allow check_paths to be a function that can return True/False to select members without having to run getmembers() first, which is expensive for large archives, or even to return an adjusted path, e.g. to extract the member /etc/passwd to a safe location or to lowercase filename extensions.

    @0xr1
    Copy link

    0xr1 commented Sep 22, 2022

    Does anyone know what sort of patch Trellix is applying on public GitHub projects to fix CVE-2007-4559?

    @Kasimir123
    Copy link

    Hello everyone,

    I am Kasimir, the project lead from Trellix for the research regarding CVE-2007-4559 and I just wanted to share some of our research here and answer possible questions from the community.

    Q: Does anyone know what sort of patch Trellix is applying on public GitHub projects to fix CVE-2007-4559?

    Our patch is based off of what is done in pip's source code. We created a wrapper function for extractall which takes in the tar object created by opening an archive, as well as any other parameters that would be passed to the extractall. Within the wrapper function we then check to see if the extracted files are all within the directory they should be extracted to. We chose this approach since we were worried that trying to filter out certain characters would increase the chances of a bypass. Not having a filter was also chosen since ".." is a legitimate use case and not a security concern if the extracted file does not escape outside of the directory that was specified as the extraction directory, our patch would not stop people from being able to do this.

    Q: Seems the concern is that even if the package is conformant to the spec, the spec allows for behavior that is exploitable?

    During our research we noticed that most other tools that allow you to extract tar files (including the default command line tool), do not allow tarfiles that attempt to write outside of the directory by default. Most of these tools aim to protect the user by default and allow knowledgable users to extract the files regardless with certain flags. The reason that we published our research was to bring our concern into the light since while analyzing open source repositories we discovered that most of the code did not do security checks despite being in situations where it should be done.

    We will be monitoring this thread and will attempt to answer any other questions that arise pertaining to our research!

    @0xr1
    Copy link

    0xr1 commented Sep 22, 2022

    Thanks, @Kasimir123. Can I get a link to one of your PRs or a sample source code where you've fixed this vulnerability?

    @Kasimir123
    Copy link

    @0xr1 Due to the large number of repositories that need patching we have been patching locally and will be starting to do the pull requests next week. The patches vary slightly based on each project since our patching tool attempts to match the syntax of whatever is being patched but what we are modifying is published here.

    @jowagner
    Copy link

    jowagner commented Sep 22, 2022

    Calling getmembers as in the Trellix approach is potentially very expensive (large files) and does not work with tar files that cannot be rewound, e.g. sys.stdin.buffer. As far as I understand, the OP's patch avoids this problem by extending the existing extraction loops to perform the test on-the-fly as each member is reached, requiring only a single pass over the tar file.

    The Trellix patch furthermore seems to assume that absolute paths are ok and written relative to the target directory. The tarfile module documentation does not say what behaviour is expected from the extract functions for absolute paths. The warning box suggests that the extraction functions would ignore path and write to the absolute path. If this is the case, the test would not detect this as it always joins the member.name with path, falsely assuming the absolute paths will be moved under the given path (like a chroot).

    If the extract functions also join member.name with path the documentation needs to be updated and this could be considered a bug if the expected POSIX behaviour is to (attempt to) restore the absolute paths to the absolute locations. While prioritising safety over conformance can be argued for, a parameter should be provided to switch on unsafe conformant behaviour.

    Edit:

    1. [Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default #73974 (comment) confirms that the tarfile module writes to the absolute path, i.e. '.' is not joined to it.
    2. The Trellix patch correctly rejects absolute paths if path != '/'. No '.' prefix is added to the member name if it is an absolute path as os.path.join() treats absolute path special, discarding previous path components.

    @6t8k
    Copy link
    Contributor

    6t8k commented Sep 23, 2022

    Doing something you shouldn't with untrusted input isn't the language's fault.

    I think it falls into the responsibility of CPython standard library devs to design good APIs, which includes effort to make it hard for users to use them in a way that nurtures security issues.

    GNU tar*, at least, strips leading slashes from file names, and rejects attempts to extracts files that contain a .. file name component, by default. [1]

    Perhaps safe_* functions could be introduced in the future, which would be preferred/encouraged as the default go-to, similar to how PyYAML has a safe_load. [2] Users could rest assured that their code won't have this security issue, without having to bother with the function arguments. There are more details that may be eligible to be taken care of by safe variants, like not overriding existing files (which both TarFile.extractall and TarFile.extract currently do, silently).

    The non-safe variants could be made to emit a suppressable warning, and users who positively know they need to write outside the target directory could resort to/keep using them.

    * According to the docs, tarfile explicitly supports ustar, pax and GNU tar formats

    @vstinner
    Copy link
    Member

    This issue is closed, I suggest continuing the discussion on the issue #73974.

    @JLLeitschuh
    Copy link

    Hi!

    I'm a Senior Software Security Researcher over at the Open Source Security Foundation under the Linux Foundation.

    I think it falls into the responsibility of CPython standard library devs to design good APIs, which includes effort to make it hard for users to use them in a way that nurtures security issues.

    IMHO, this is the responsibility of all API designers. But absolutely the python standard library.

    Perhaps safe_* functions could be introduced in the future, which would be preferred/encouraged as the default go-to, similar to how PyYAML has a safe_load. [2] Users could rest assured that their code won't have this security issue, without having to bother with the function arguments. There are more details that may be eligible to be taken care of by safe variants, like not overriding existing files (which both TarFile.extractall and TarFile.extract currently do, silently).

    PyYAML is currently using the safe_load as the default for load meaning that, even if you don't read the docs, the default load is safe. It is the option of almost all security practitioners that security should be opt-out, not opt-in.

    @6t8k
    Copy link
    Contributor

    6t8k commented Jan 29, 2023

    Hey there,

    PyYAML is currently using the safe_load as the default for load [...]

    That doesn't seem accurate: in the most recent version of PyYAML at the time of my previous post, and today, 6.0 (master's tip too), yaml.load() expects a Loader arg, and Loader=yaml.Loader or Loader=yaml.CLoader (both of which are unsafe, reading the code) could plausibly be a naive choice. The project's README.md even explicitly points to yaml.safe_load(stream) for untrusted input.

    It is the option of almost all security practitioners that security should be opt-out, not opt-in.

    Oh I don't doubt that security should be made the default to the extent possible/practical. This is easier to do for greenfield projects. tarfile is by no means a greenfield project,1 and compatibility with existing code is generally important to consider when making changes to CPython. I wanted to avoid breaking compatibility, but probably should have, in combination with giving it appropriate lead time. Moreover, such sibling functions are actually often a code smell – or "security smell", if you will. In short, I wouldn't propose the same today for a couple of reasons, and shouldn't have posted it back then.

    There are people who have put a much better effort into how to improve tarfile with respect to the issue at hand. I'm glad to see that things seem to start rolling (again). Relevant discussion should and does happen elsewhere (e.g. here, which seems to be the most promising attempt to date), so I'd like to avoid posting here further.

    Footnotes

    1. There are millions of people developing software in Python, see e.g. here (even more are using it in some way), the lion's share of which should be CPython users

    @JLLeitschuh
    Copy link

    JLLeitschuh commented Jan 31, 2023

    This is easier to do for greenfield projects. tarfile is by no means a greenfield project,1 and compatibility with existing code is generally important to consider when making changes to CPython. I wanted to avoid breaking compatibility, but probably should have, in combination with giving it appropriate lead time.

    Back in 2019, I pushed an initiative across the entire java ecosystem for the major artifact servers like Maven Central, and JCenter (what pip is to the python ecosystem) to formally drop support for HTTP in favor of HTTPS only on January 15th, 2020.

    This was due to a widespread common security vulnerability I discovered impacting the Java ecosystem: https://infosecwriteups.com/want-to-take-over-the-java-ecosystem-all-you-need-is-a-mitm-1fc329d898fb

    On January 15th, 2020, the build infrastructure of 20% of the java ecosystem was broken due to this change. It fixed a large security hole in the supply chain of the java ecosystem.

    This vulnerability has impacted tens of thousands of OSS projects (trellex just opened 65,000 pull requests attempting to fix this vulnerability). I recognize that there are negative impacts of API breaking changes, but it's important that ensuring API's aren't broken doesn't overshadow ensuring the users of those API are secure by default.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants