Message 32097 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eigenlambda
Recipients
Date	2007-05-20.22:35:20
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
This is a conversation with the current Python interpreter. >>> import urlparse >>> urlparse.urlparse(urlparse.urlunparse(urlparse.urlparse("file:////usr/bin/python"))) ('file', 'usr', '/bin/python', '', '', '') As you can see, the results are incorrect. The problem is in the urlunsplit function: def urlunsplit((scheme, netloc, url, query, fragment)): if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url RFC 1808 (see http://www.ietf.org/rfc/rfc1808.txt ) specifies that a URL shall have the following syntax: <scheme>://<net_loc>/<path>;<params>?<query>#<fragment> The problem with the current version of urlunsplit is that it tests if there are already two slashes before the 'url' section before outputting a URL. This is incorrect because (1) RFC 1808 clearly specifies at least three slashes between the end of the scheme portion and the beginning of the path portion and (2) this method will strip the first few slashes from an arbitrary path portion, which may require those slashes. Removing that url[:2] != '//' causes urlunsplit to behave correctly when dealing with urls like file:////usr/bin/python .

This is a conversation with the current Python interpreter.

>>> import urlparse
>>> urlparse.urlparse(urlparse.urlunparse(urlparse.urlparse("file:////usr/bin/python")))
('file', 'usr', '/bin/python', '', '', '')

As you can see, the results are incorrect.  The problem is in the urlunsplit function:

def urlunsplit((scheme, netloc, url, query, fragment)):
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url
        url = '//' + (netloc or '') + url
    if scheme:
        url = scheme + ':' + url
    if query:
        url = url + '?' + query
    if fragment:
        url = url + '#' + fragment
    return url

RFC 1808 (see http://www.ietf.org/rfc/rfc1808.txt ) specifies that a URL shall have the following syntax:
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

The problem with the current version of urlunsplit is that it tests if there are already two slashes before the 'url' section before outputting a URL.  This is incorrect because (1) RFC 1808 clearly specifies at least three slashes between the end of the scheme portion and the beginning of the path portion and (2) this method will strip the first few slashes from an arbitrary path portion, which may require those slashes.  Removing that url[:2] != '//' causes urlunsplit to behave correctly when dealing with urls like file:////usr/bin/python .

History
Date	User	Action	Args
2007-08-23 14:54:01	admin	link	issue1722348 messages
2007-08-23 14:54:01	admin	create