Message 75851 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	orsenthil
Recipients	monk.e.boy, orsenthil
Date	2008-11-14.06:35:38
SpamBayes Score	1.542988e-12
Marked as misclassified	No
Message-id	<1226644541.68.0.268612231605.issue4191@psf.upfronthosting.co.za>
In-reply-to

Content
This report almost seems like a bug with urlparse, but it is not. We have to consider certain cases here. 1) First of all, we cannot equate urlparsing, urlsplit, urljoin with path normalization provided by posixpath.normalize. The reason is the url syntax is strictly by RFCs which are different than Operating system's file and directory naming syntaxes. So, the expectation that urlparse() should return the same result as posixpath.normalize() is wrong. What we can at most look is, does urlparse follow the guidelines mentioned in the RFC1808 to start with and RFC3986 ( Current). 2) Secondly, in a generic sense, it is better to follow the RFC defined parsing rules for URLS than implementing browser behavior. Because, the urlparse needs to parse urls of other schemes also say svn+ssh where a valid url is svn+ssh://localhost///// and in this case '////' is the the name of my directory where I have the source code. Quite possible, right? So, it should not be converted to '/' which will be wrong. 3) And coming down to the more specific issues with the examples presented in this report, urlsplit considers the first '//' to follow the netloc and a single '/' or '///' to be path '/' >>> urlparse.urlsplit('//') SplitResult(scheme='', netloc='', path='', query='', fragment='') >>> urlparse.urlsplit('/') SplitResult(scheme='', netloc='', path='/', query='', fragment='') >>> urlparse.urlsplit('///') SplitResult(scheme='', netloc='', path='/', query='', fragment='') Having this in mind, follow the examples you have provided: print urlparse.urljoin('http://www.example.com///', '//') print urlparse.urljoin('http://www.example.com///', '/') print urlparse.urljoin('http://www.example.com///', '') You will find that they are according the parsing and joining rules as defined in RFC 1808 (http://www.faqs.org/rfcs/rfc1808.html) The same is with other examples, monk.e.boy. If you see that urlparse method has a problem, then please point me to the section in the RFC1808/RFC3986, where it is not confirming, I shall work on the patch to fix. This report, is not a valid bug and can be closed.

This report almost seems like a bug with urlparse, but it is not. We
have to consider certain cases here.

1) First of all, we cannot equate urlparsing, urlsplit, urljoin with
path normalization provided by posixpath.normalize. The reason is the
url syntax is strictly by RFCs which are different than Operating
system's file and directory naming syntaxes. So, the expectation that
urlparse() should return the same result as posixpath.normalize() is
wrong. What we can at most look is, does urlparse follow the guidelines
mentioned in the RFC1808 to start with and RFC3986 ( Current). 

2) Secondly, in a generic sense, it is better to follow the RFC defined
parsing rules for URLS than implementing browser behavior. Because, the
urlparse needs to parse urls of other schemes also say svn+ssh where a
valid url is svn+ssh://localhost///// and in this case '////' is the the
name of my directory where I have the source code. Quite possible,
right? So, it should not be converted to '/' which will be wrong.

3) And coming down to the more specific issues with the examples
presented in this report,
urlsplit considers the first '//' to follow the netloc
and a single '/' or '///' to be path '/'

>>> urlparse.urlsplit('//')
SplitResult(scheme='', netloc='', path='', query='', fragment='')
>>> urlparse.urlsplit('/')
SplitResult(scheme='', netloc='', path='/', query='', fragment='')
>>> urlparse.urlsplit('///')
SplitResult(scheme='', netloc='', path='/', query='', fragment='')

Having this in mind, follow the examples you have provided:

print urlparse.urljoin('http://www.example.com///', '//')
print urlparse.urljoin('http://www.example.com///', '/')
print urlparse.urljoin('http://www.example.com///', '')

You will find that they are according the parsing and joining rules as
defined in RFC 1808 (http://www.faqs.org/rfcs/rfc1808.html)

The same is with other examples, monk.e.boy. 

If you see that urlparse method has a problem, then please point me to
the section in the RFC1808/RFC3986, where it is not confirming, I shall
work on the patch to fix.

This report, is not a valid bug and can be closed.

History
Date	User	Action	Args
2008-11-14 06:35:41	orsenthil	set	recipients: + orsenthil, monk.e.boy
2008-11-14 06:35:41	orsenthil	set	messageid: <1226644541.68.0.268612231605.issue4191@psf.upfronthosting.co.za>
2008-11-14 06:35:40	orsenthil	link	issue4191 messages
2008-11-14 06:35:38	orsenthil	create