Message123898
Copy of issue 10696
This issue is exactly the same as issue 10696 except it affects a different function, urllib.parse.urlparse (instead of urllib.parse.urlsplit).
urlparse function from urllib.parse.urlparse does not return the port field.
REPRO STEPS:
>>> import urllib
>>> import urllib.parse
>>> urllib.parse.urlparse(r'http://foo.bar.com:80/blarg?a=1&b=2')
RETURNS:
ParseResult(scheme='http', netloc='foo.bar.com:80', path='/blarg', params='', query='a=1&b=2', fragment='')
EXPECTED:
ParseResult(scheme='http', netloc='foo.bar.com', path='/blarg', port='80', params='', query='a=1&b=2', fragment='')
END REPRO
The documentation at http://docs.python.org/py3k/library/urllib.parse.html#urllib.parse.urlsplit shows this as expected. What is the purpose of a possible port parameter if that port parameter is not set?
According to RFC 1808 the syntatic components are
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
However, according to referenced RFC 1738 (referenced by RFC 1808)
http://tools.ietf.org/html/rfc1738#section-3.1
the <net_loc> can be further separated to <host> and <port>.
I guess a bigger more general complaint about this is, why not make urlparse more useful by separating <host> and <port>?
I imagine this is a common need of users. I like standards. And doing a little extra to work with standards make those standards even more useful. |
|
Date |
User |
Action |
Args |
2010-12-13 20:17:20 | JTMoon79 | set | recipients:
+ JTMoon79 |
2010-12-13 20:17:20 | JTMoon79 | set | messageid: <1292271440.42.0.107945784803.issue10697@psf.upfronthosting.co.za> |
2010-12-13 20:17:18 | JTMoon79 | link | issue10697 messages |
2010-12-13 20:17:18 | JTMoon79 | create | |
|