Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVE-2015-2104] Urlparse insufficient validation leads to open redirect #67693

Open
yaaboukir mannequin opened this issue Feb 24, 2015 · 23 comments
Open

[CVE-2015-2104] Urlparse insufficient validation leads to open redirect #67693

yaaboukir mannequin opened this issue Feb 24, 2015 · 23 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@yaaboukir
Copy link
Mannequin

yaaboukir mannequin commented Feb 24, 2015

BPO 23505
Nosy @orsenthil, @pitrou, @vstinner, @tiran, @benjaminp, @vadmium, @PaulMcMillan, @ztane, @epicfaace
Dependencies
  • bpo-22852: urllib.parse wrongly strips empty #fragment, ?query, //netloc
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-02-24.00:11:53.909>
    labels = ['type-security', '3.7', 'library']
    title = '[CVE-2015-2104] Urlparse insufficient validation leads to open redirect'
    updated_at = <Date 2019-10-24.10:32:56.296>
    user = 'https://bugs.python.org/yaaboukir'

    bugs.python.org fields:

    activity = <Date 2019-10-24.10:32:56.296>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-02-24.00:11:53.909>
    creator = 'yaaboukir'
    dependencies = ['22852']
    files = []
    hgrepos = []
    issue_num = 23505
    keywords = []
    message_count = 22.0
    messages = ['236470', '236471', '236472', '237088', '237090', '237093', '237096', '237097', '237106', '237149', '237164', '237200', '237411', '237412', '240191', '240207', '240237', '277328', '277342', '277350', '277354', '322676']
    nosy_count = 12.0
    nosy_names = ['orsenthil', 'pitrou', 'vstinner', 'christian.heimes', 'benjamin.peterson', 'python-dev', 'martin.panter', 'PaulMcMillan', 'ztane', 'soilandreyes', 'yaaboukir', 'epicfaace']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue23505'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    Linked PRs

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Feb 24, 2015

    The module urlparse lacks proper validation of the input leading to open redirect vulnerability.

    The issue is that URLs do not survive the round-trip through urlunparse(urlparse(url)). Python sees /////foo.com as a URL with no hostname or scheme and a path of //foo.com, but when it reconstructs the URL after parsing, it becomes //foo.com.

    This can be practically exploited this way : http://example.com/login?next=/////evil.com

    The for fix this would be for urlunparse() to serialize paths with two leading slashes as '/%2F', at least when scheme and netloc are empty.

    @yaaboukir yaaboukir mannequin added stdlib Python modules in the Lib dir type-security A security issue labels Feb 24, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Feb 24, 2015

    Perhaps you actually meant four input slashes, producing two output slashes. That seems more of a bug to me:

    >>> urlparse("////foo.com")
    ParseResult(scheme='', netloc='', path='//foo.com', params='', query='', fragment='')
    >>> urlunparse(_)
    '//foo.com'

    Solving bpo-22852, which proposes some flags including “has_netloc” on the ParseResult object, might help with this.

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Feb 24, 2015

    Yes! forgive my mistake I meant four slashes.

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 2, 2015

    For your information, this security issue has been assigned a CVE ID : CVE-2015-2104

    @vstinner
    Copy link
    Member

    vstinner commented Mar 2, 2015

    This can be practically exploited this way : http://example.com/login?next=/////evil.com

    Can you please elaborate on the "exploit" part?

    In Firefox, the "////etc/passwd" link shows me my local file /etc/passwd. Ok, but how is it an issue?

    "//etc/passwd" also shows me file:////etc/passwd.

    The OWASP article on Open Redirect shows example to redirect to a different website. Can you should an example how redirect to a website and not a file:// URL?

    https://www.owasp.org/index.php/Open_redirect

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 3, 2015

    Yes, exploiting this bug an attacker may redirect a specific vitim to a malicious website, in our case evil.com

    >> x = urlparse("////evil.com")

    ///evil.com will be parsed as relative-path URL which is the correct expected behaviour

    >> print x
    >> ParseResult(scheme='', netloc='', path='//evil.com', params='', query='', fragment='')

    As you see two slashes are removed and it is marked as a relative-path URL but when we reconstruct the URL using urlunparse() function, the URL is treated as an absolute URL to which you will be redirected.

    >>> x = urlunparse(urlparse("////evil.com"))
    >>> urlparse(x)
    ParseResult(scheme='', netloc='evil.com', path='', params='', query='', fragment='')

    @vstinner
    Copy link
    Member

    vstinner commented Mar 3, 2015

    >>> urlparse("//evil.com")
    ParseResult(scheme='', netloc='evil.com', path='', params='', query='', fragment='')

    I see evil.com in the netloc field, ok. But Firefox doesn't use Python to parse and url, and typing //evil.com in the address bar converts the address to file:////evil.com. Not a website, but a local file.

    So I don't understand the redirection part. Could you maybe write a vulnerable CGI script to demonstrate the bug?

    I wrote the following HTML file to try to understand the bug, but I was only able to show the content of my local file /etc/issue:

    <head>
    <META http-equiv="refresh" content="5;URL=////etc/issue">
    </head>
    <p><a href="////etc/issue">issue</a></p>

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 3, 2015

    When you directly type //evil.com or ////evil.com in Firefox URL bar you will be redirect to evil.com and that is very known, read this :

    http://homakov.blogspot.com/2014/01/evolution-of-open-redirect-vulnerability.html

    Here is a video demonstration of the vulnerability : http://youtu.be/l0uDAqpRPpo

    @vadmium
    Copy link
    Member

    vadmium commented Mar 3, 2015

    Do you think it would be enough to ensure the urlparse() result remembers whether the empty “//” was present or not? In other words, something like the following mockup (based on the bpo-22852 proposal). An example vunerable program would help me understand this as well.

    >>> urlparse("////evil.com")
    ParseResult(scheme="", netloc="", has_netloc=True, path="//evil.com", ...)
    >>> urlunparse(_)
    "////evil.com"

    Or would we still need special handling of a path that starts with a double slash despite that; either URL-encoding the second slash, or maybe just raising an exception? Consider that the components are already supposed to be URL-encoded, and you can still generate unexpected valid URLs by giving other invalid components, such as

    >>> urlunparse(("", "netloc/with/path", "/more/path", "", "", ""))
    '//netloc/with/path/more/path'

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 3, 2015

    I am not quiet sure about the first proposal but I strongly believe the appropriate method to fix this is by checking if the path starts with double slashes and then URL encoding the two leading slashes.

    @PaulMcMillan
    Copy link
    Mannequin

    PaulMcMillan mannequin commented Mar 3, 2015

    While some websites may use urlunparse(urlparse(url)) to validate a url, this is far from standard practice. Django, for instance, does not use this method. While I agree we should clean this behavior up, this is not a vulnerability in core python, and we need to invalidate the assigned CVE.

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 4, 2015

    "Following the syntax specifications in RFC 1808, urlparse recognizes a netloc

    only if it is properly introduced by ‘//’. Otherwise the input is presumed to be

    a relative URL and thus to start with a path component."

    https://docs.python.org/2/library/urlparse.html

    2015-03-03 22:16 GMT+00:00 Paul McMillan <>:

    Yeah. I agree the lack of round trip is surprising, and I agree we
    should fix it.
    
    I think the underlying issue here is that urlparse has a pretty
    different view of the world when compared with the browsers. I know
    that bit me when I first started using python, and it periodically
    surfaces in cases like this, where the browser thinks that
    "//evil.com" is a url, but we've parsed it as part of a path.
    Backwards compatibility makes it hard to update urlparse to precisely
    match browser behavior, but there's probably room for a new library
    designed with browser compatibility as a primary feature.
    
    -Paul
    
    On Tue, Mar 3, 2015 at 10:07 PM, Antoine Pitrou <> wrote:
    >
    > Hi Paul,
    >
    > Le 03/03/2015 23:01, Paul McMillan a écrit :
    >> I understand how this works. You don't need to paste the example again.
    >>
    >> The documentation makes no guarantee that parse/unparse will do what
    >> you want them to do, and does explicitly lay out the specific rules
    >> used for separating the parts.
    >
    > Well, I don't know if it's a security issue, but failure to roundtrip
    > *is* surprising (and IMHO dangerous for that reason) behaviour to say
    > the least.
    >
    > Moreover, the urlunparse() documentation (in 3.x) says:
    > """
    > Construct a URL from a tuple as returned by urlparse(). [...] This may
    > result in a slightly different, but equivalent URL, if the URL that was
    > parsed originally had unnecessary delimiters
    > """
    > 
    

    (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunparse)
    >
    > which implies that any divergence when roundtripping should only consist
    > in cosmetic, not essential, differences ("equivalent URL").
    >
    > Regards
    >
    > Antoine.
    > -----------------------------
    > Python Security Response Team
    > Unsubscribe: https://mail.python.org/mailman/options/psrt/paul

    %40mcmillan.ws

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 7, 2015

    From: cve-assign () mitre org
    Date: Thu, 5 Mar 2015 16:42:02 -0500 (EST)

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    We think that the issue reduces to the question of whether it's
    acceptable for urlparse to provide inconsistent information about the
    structure of a URL.

    https://docs.python.org/2/library/urlparse.html says:

       urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
       Parse a URL into six components, returning a 6-tuple. This
       corresponds to the general structure of a URL:
       scheme://netloc/path;parameters?query#fragment.
    
    
       urlparse.urlunparse(parts)
       Construct a URL from a tuple as returned by urlparse(). The parts
       argument can be any six-item iterable. This may result in a
       slightly different, but equivalent URL, if the URL that was parsed
       originally had unnecessary delimiters (for example, a ? with an
       empty query; the RFC states that these are equivalent).

    The first issue is that the urlunparse documentation is ambiguous. We
    believe the reasonable interpretation is that there is a missing third
    sentence: "This ALWAYS results in a URL that is either identical or
    equivalent to the URL that was parsed originally." There's another
    interpretation that we believe is unreasonable: "This may result in a
    slightly different, but equivalent URL, if the URL that was parsed
    originally had unnecessary delimiters. If the URL that was parsed
    originally did not have unnecessary delimiters, then the behavior of
    urlunparse is UNDEFINED."

    So, our expectation is that urlunparse(urlparse(original_url)) should
    not have any significant effect on the meaning of original_url. We
    also think that a Python user should be able to rely on that property
    to make security-relevant decisions. To simply the situation, consider
    a case where the URL is used exclusively within Python code, and is
    never accessed by any web browser.

    The actual behavior is:

       >>> from urlparse import urlparse, urlunparse
       >>> print urlparse("////example.com")
       ParseResult(scheme='', netloc='', path='//example.com', params='', query='', fragment='')
       >>> print urlparse(urlunparse(urlparse("////example.com")))
       ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='')
       >>> print urlparse(urlunparse(urlparse(urlunparse(urlparse("////example.com")))))
       ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='')

    Here, urlparse(urlunparse(original_url)) does have a significant
    effect on the meaning of original_url. The Python user may have wanted
    to make a security-relevant decision based on whether netloc was an
    empty string. However, netloc is different depending on whether
    urlparse(urlunparse(original_url)) occurs at least once. The user's
    application (suppose it's called "PyNetlocExaminer") is affected in a
    security-relevant way.

    The next question is, if there is a CVE for a report of a
    security-relevant problem, what product is named as the primary
    affected product within that CVE. There is no perfect answer to this
    question. Especially in the case of a general-purpose language such as
    Python, there's an extremely wide range of bugs that might become
    security-relevant in some applications. What we usually try to do is
    make the CVE useful to users who may need to perform a software
    update. Specifically:

    1. If the language implementation is not ever going to be changed
      (for example: because the language maintainer believes the
      observed behavior has always been correct, or the language
      maintainer believes that it has retroactively become correct
      because any change would break compatibility with other
      applications), then the application is named as the primary
      affected product in the CVE. In other words, if the inconsistency
      between netloc='' and netloc='example.com' were actually the
      intended behavior all along, then PyNetlocExaminer would be named
      in the CVE. Here, realistically, the end user would need to
      update or manually fix PyNetlocExaminer.

    2. If the language implementation is incorrect and is planned to be
      changed at some point, and that would eliminate the
      security-relevant problem, then the language implementation is
      named in the CVE. (An application might also be named in the CVE,
      especially if there are very few affected applications.) This
      option occurs regardless of whether the language maintainer
      believes that it is a language vulnerability. (The language
      maintainer has the option of composing a dispute that would be
      appended to the CVE.) Here, the end user may ultimately decide to
      address the problem by updating their Python installation, not by
      updating PyNetlocExaminer.

    Again, this is imperfect. It works best in the relatively common case
    where a language bug has security relevance in many applications. It
    might work especially poorly in a case where a language bug has
    security relevance in exactly one application. However, it seems
    preferable to do the above consistently, rather than make the outcome
    depend on application populations, or depend on reaching universal
    agreement about what code should have been written differently.


    CVE assignment team, MITRE CVE Numbering Authority
    M/S M300
    202 Burlington Road, Bedford, MA 01730 USA
    [ PGP key available through http://cve.mitre.org/cve/request_id.html ]

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Mar 7, 2015

    From: Amos Jeffries <squid3 () treenet co nz>
    Date: Fri, 06 Mar 2015 14:09:55 +1300

    On 6/03/2015 10:42 a.m., cve-assign () mitre org wrote:

    We think that the issue reduces to the question of whether it's
    acceptable for urlparse to provide inconsistent information about the
    structure of a URL.
    
    https://docs.python.org/2/library/urlparse.html says:
    
           urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
           Parse a URL into six components, returning a 6-tuple. This
           corresponds to the general structure of a URL:
           scheme://netloc/path;parameters?query#fragment.

    My 2c ... no it does not.

    There are 7 parts in a URL. What is called "netloc" in that description
    is actually two fields: [userinfo '@'] authority

    The userinfo field is very much alive and well in non-HTTP schemes.

    Ignoring the userinfo field leaves implementations open to attacks of
    the form:
    scheme://example.com () phishing com/path

    AYJ

    @yaaboukir
    Copy link
    Mannequin Author

    yaaboukir mannequin commented Apr 6, 2015

    Any updates concerning this issue ? is it going to be fixed or at least modify the documentation in order to warn developers about this behaviour ?

    @vadmium
    Copy link
    Member

    vadmium commented Apr 7, 2015

    FYI I posted a patch at bpo-22852 to retain the empty netloc “//” when appropriate. But even if there is interest in that patch, I guess it can still only be applied to the next version of Python (3.5 or whatever), being a new feature.

    Maybe you could suggest some wording or a patch to the documentation that could be applied to bugfix releases as well.

    @PaulMcMillan
    Copy link
    Mannequin

    PaulMcMillan mannequin commented Apr 7, 2015

    As Martin said, backporting a change for this wouldn't be appropriate
    for Python 2.7. The 2.7 docs could certainly be updated to make this
    clearer, but we can't introduce a breaking change like that to the
    stable release.

    @tiran
    Copy link
    Member

    tiran commented Sep 24, 2016

    What's the verdict on this bug? It's been dangling for almost one and half year.

    @tiran tiran added the 3.7 (EOL) end of life label Sep 24, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Sep 24, 2016

    It is not clear what Yassine’s bug is. Maybe it is about round-tripping from urlparse() → urlunparse(). If so, it could be solved by fixing either of the following two problems:

    1. urlunparse() forgets the initial pair of slashes when netloc="". That might be addressed by bpo-22852, and documented as a limitation in the mean time.

    2. urlunparse() accepts invalid components, such as netloc="", path="//evil.com", which transforms the path into a hostname. Yassine preferred to percent-encode the path and pass it through, though I think an exception would be more sensible. Or just documenting that there is little or no validation.

    When considering the second problem of validation, you have to be aware that urlunparse() is documented to handle schemes like “mailto:” not listed in “uses_netloc”. According to RFC 6068, mailto://evil.com is valid syntax, and is decoded to netloc="", path="//evil.com". In this case, netloc="evil.com" would probably be invalid instead.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 25, 2016

    The problem is in urlunparse. If you check the RFC 2396, it has the following regular expression:

     ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
    

    where group 3 is the //netloc, and 4 is netloc substring, of an url http://netloc/foobar - and this is what Python should use to parse an URI using RFC 2396... but:

        >>> pat.fullmatch('////netloc').group(4)
        ''
        >>> pat.fullmatch('/relative').group(4)
        >>> 

    Someone took the shortcut. no netloc is different from netloc being the empty string '', but

        >>> urlparse('////netloc')
        ParseResult(scheme='', netloc='', path='//netloc', params='', query='', fragment='')
        >>> urlparse('/netloc')
        ParseResult(scheme='', netloc='', path='/netloc', params='', query='', fragment='')

    In the latter parsing result netloc should be *None*. Unfortunately I believe it is too late to change this either way.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Sep 25, 2016

    *I mean the problem exists in urlparse, not only in urlunparse

    @vadmium
    Copy link
    Member

    vadmium commented Jul 30, 2018

    bpo-34276 was opened about a similar case for “file:” URLs. I believe both “file:” scheme and no-scheme cases are a regression and could be fixed by adding another pair of slashes (an empty “netloc” part):

    >>> urlparse("////foo.com")  # No change
    ParseResult(scheme='', netloc='', path='//foo.com', params='', query='', fragment='')
    >>> urlunparse(_)  # Knows to escape slashes with another double-slash
    '////foo.com'

    @vstinner vstinner changed the title Urlparse insufficient validation leads to open redirect [CVE-2015-2104] Urlparse insufficient validation leads to open redirect Oct 24, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @serhiy-storchaka
    Copy link
    Member

    #113563 fixes this issue. If the path starts with //, we should always add // and a netloc, even if the netlock is empty and the scheme does not use netloc.

    It is interesting, that f3963b1 was almost correct fix, but it used wrong boolean operator, so it broke a common case of file: scheme. Fixing it for that case made things worse and introduced this bug, when // is not added even when the scheme uses a netlog. See #78457 (comment) for some historical references of this code.

    @serhiy-storchaka serhiy-storchaka removed the 3.7 (EOL) end of life label Dec 29, 2023
    @serhiy-storchaka serhiy-storchaka added 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Dec 29, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants