classification
Title: host and port attributes not documented well in function urllib.parse.urlparse and urlsplit
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.2, Python 3.1, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: JTMoon79, docs@python, fdrake, orsenthil, r.david.murray
Priority: normal Keywords:

Created on 2010-12-13 20:17 by JTMoon79, last changed 2010-12-13 20:48 by JTMoon79.

Messages (4)
msg123898 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2010-12-13 20:17
Copy of issue 10696
This issue is exactly the same as issue 10696 except it affects a different function, urllib.parse.urlparse (instead of urllib.parse.urlsplit).

urlparse function from urllib.parse.urlparse does not return the port field.
REPRO STEPS:
>>> import urllib
>>> import urllib.parse
>>> urllib.parse.urlparse(r'http://foo.bar.com:80/blarg?a=1&b=2')
RETURNS:
ParseResult(scheme='http', netloc='foo.bar.com:80', path='/blarg', params='', query='a=1&b=2', fragment='')
EXPECTED: 
ParseResult(scheme='http', netloc='foo.bar.com', path='/blarg', port='80', params='', query='a=1&b=2', fragment='')
END REPRO

The documentation at http://docs.python.org/py3k/library/urllib.parse.html#urllib.parse.urlsplit shows this as expected.  What is the purpose of a possible port parameter if that port parameter is not set?

According to RFC 1808 the syntatic components are 
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
However, according to referenced RFC 1738 (referenced by RFC 1808)
http://tools.ietf.org/html/rfc1738#section-3.1
the <net_loc> can be further separated to <host> and <port>.

I guess a bigger more general complaint about this is, why not make urlparse more useful by separating <host> and <port>?
I imagine this is a common need of users.  I like standards.  And doing a little extra to work with standards make those standards even more useful.
msg123901 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-13 20:30
The repr gives the primary components defined by the URL.  The subfields are provided as attributes of the result.  This is documented in the example at the top of the chapter, but it is not, IMO, well documented in the rest of the chapter.

I'm not sure when this feature was introduced, so I'm leaving 3.1 in the versions for now.
msg123902 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2010-12-13 20:43
These attributes were added in Python 2.5.

Documentation improvements should be backported to 2.7 and 3.1.
msg123903 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2010-12-13 20:48
Doh!  I feel a bit silly.
I didn't notice 'hostname' and 'port' in 
>>> dir(urllib.parse.urlparse(r'http://foo.bar.com:80/blarg?a=1&b=2'))
[... 'count', 'fragment', 'geturl', 'hostname', 'index'
, 'netloc', 'params', 'password', 'path', 'port', 'query', 'scheme', 'username']

I agree, some clarity in the documentation for these overlapping fields (<net_loc>,<port>,<hostname>) would help.

-J_Tom_Moon_79
History
Date User Action Args
2010-12-13 20:48:37JTMoon79setmessages: + msg123903
2010-12-13 20:44:25fdrakesetversions: + Python 2.7
2010-12-13 20:43:43fdrakesetnosy: + fdrake
messages: + msg123902
2010-12-13 20:30:44r.david.murraysetassignee: docs@python

components: + Documentation, - Library (Lib)
title: port not split in function urllib.parse.urlparse -> host and port attributes not documented well in function urllib.parse.urlparse and urlsplit
nosy: + docs@python, r.david.murray, orsenthil
versions: + Python 3.2
messages: + msg123901
stage: needs patch
2010-12-13 20:17:18JTMoon79create