Message 237200 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	yaaboukir
Recipients	PaulMcMillan, benjamin.peterson, martin.panter, orsenthil, pitrou, python-dev, soilandreyes, vstinner, yaaboukir
Date	2015-03-04.18:41:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1425494489.7.0.699781300971.issue23505@psf.upfronthosting.co.za>
In-reply-to

Content
"Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component." https://docs.python.org/2/library/urlparse.html 2015-03-03 22:16 GMT+00:00 Paul McMillan <>: Yeah. I agree the lack of round trip is surprising, and I agree we should fix it. I think the underlying issue here is that urlparse has a pretty different view of the world when compared with the browsers. I know that bit me when I first started using python, and it periodically surfaces in cases like this, where the browser thinks that "//evil.com" is a url, but we've parsed it as part of a path. Backwards compatibility makes it hard to update urlparse to precisely match browser behavior, but there's probably room for a new library designed with browser compatibility as a primary feature. -Paul On Tue, Mar 3, 2015 at 10:07 PM, Antoine Pitrou <> wrote: > > Hi Paul, > > Le 03/03/2015 23:01, Paul McMillan a écrit : >> I understand how this works. You don't need to paste the example again. >> >> The documentation makes no guarantee that parse/unparse will do what >> you want them to do, and does explicitly lay out the specific rules >> used for separating the parts. > > Well, I don't know if it's a security issue, but failure to roundtrip > is surprising (and IMHO dangerous for that reason) behaviour to say > the least. > > Moreover, the urlunparse() documentation (in 3.x) says: > """ > Construct a URL from a tuple as returned by urlparse(). [...] This may > result in a slightly different, but equivalent URL, if the URL that was > parsed originally had unnecessary delimiters > """ > (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunparse) > > which implies that any divergence when roundtripping should only consist > in cosmetic, not essential, differences ("equivalent URL"). > > Regards > > Antoine. > ----------------------------- > Python Security Response Team > Unsubscribe: https://mail.python.org/mailman/options/psrt/paul %40mcmillan.ws

"Following the syntax specifications in RFC 1808, urlparse recognizes a netloc 

only if it is properly introduced by ‘//’. Otherwise the input is presumed to be 

a relative URL and thus to start with a path component." 

https://docs.python.org/2/library/urlparse.html

2015-03-03 22:16 GMT+00:00 Paul McMillan <>:

    Yeah. I agree the lack of round trip is surprising, and I agree we
    should fix it.

    I think the underlying issue here is that urlparse has a pretty
    different view of the world when compared with the browsers. I know
    that bit me when I first started using python, and it periodically
    surfaces in cases like this, where the browser thinks that
    "//evil.com" is a url, but we've parsed it as part of a path.
    Backwards compatibility makes it hard to update urlparse to precisely
    match browser behavior, but there's probably room for a new library
    designed with browser compatibility as a primary feature.

    -Paul

    On Tue, Mar 3, 2015 at 10:07 PM, Antoine Pitrou <> wrote:
    >
    > Hi Paul,
    >
    > Le 03/03/2015 23:01, Paul McMillan a écrit :
    >> I understand how this works. You don't need to paste the example again.
    >>
    >> The documentation makes no guarantee that parse/unparse will do what
    >> you want them to do, and does explicitly lay out the specific rules
    >> used for separating the parts.
    >
    > Well, I don't know if it's a security issue, but failure to roundtrip
    > *is* surprising (and IMHO dangerous for that reason) behaviour to say
    > the least.
    >
    > Moreover, the urlunparse() documentation (in 3.x) says:
    > """
    > Construct a URL from a tuple as returned by urlparse(). [...] This may
    > result in a slightly different, but equivalent URL, if the URL that was
    > parsed originally had unnecessary delimiters
    > """
    > 

(https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunparse)
    >
    > which implies that any divergence when roundtripping should only consist
    > in cosmetic, not essential, differences ("equivalent URL").
    >
    > Regards
    >
    > Antoine.
    > -----------------------------
    > Python Security Response Team
    > Unsubscribe: https://mail.python.org/mailman/options/psrt/paul

%40mcmillan.ws

History
Date	User	Action	Args
2015-03-04 18:41:29	yaaboukir	set	recipients: + yaaboukir, orsenthil, pitrou, vstinner, benjamin.peterson, python-dev, martin.panter, PaulMcMillan, soilandreyes
2015-03-04 18:41:29	yaaboukir	set	messageid: <1425494489.7.0.699781300971.issue23505@psf.upfronthosting.co.za>
2015-03-04 18:41:29	yaaboukir	link	issue23505 messages
2015-03-04 18:41:29	yaaboukir	create