classification
Title: host and port attributes not documented well in function urllib.parse.urlparse and urlsplit
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: JTMoon79, docs@python, fdrake, martin.panter, orsenthil, r.david.murray
Priority: normal Keywords:

Created on 2010-12-13 20:17 by JTMoon79, last changed 2016-07-14 05:54 by orsenthil. This issue is now closed.

Messages (6)
msg123898 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2010-12-13 20:17
Copy of issue 10696
This issue is exactly the same as issue 10696 except it affects a different function, urllib.parse.urlparse (instead of urllib.parse.urlsplit).

urlparse function from urllib.parse.urlparse does not return the port field.
REPRO STEPS:
>>> import urllib
>>> import urllib.parse
>>> urllib.parse.urlparse(r'http://foo.bar.com:80/blarg?a=1&b=2')
RETURNS:
ParseResult(scheme='http', netloc='foo.bar.com:80', path='/blarg', params='', query='a=1&b=2', fragment='')
EXPECTED: 
ParseResult(scheme='http', netloc='foo.bar.com', path='/blarg', port='80', params='', query='a=1&b=2', fragment='')
END REPRO

The documentation at http://docs.python.org/py3k/library/urllib.parse.html#urllib.parse.urlsplit shows this as expected.  What is the purpose of a possible port parameter if that port parameter is not set?

According to RFC 1808 the syntatic components are 
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
However, according to referenced RFC 1738 (referenced by RFC 1808)
http://tools.ietf.org/html/rfc1738#section-3.1
the <net_loc> can be further separated to <host> and <port>.

I guess a bigger more general complaint about this is, why not make urlparse more useful by separating <host> and <port>?
I imagine this is a common need of users.  I like standards.  And doing a little extra to work with standards make those standards even more useful.
msg123901 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-13 20:30
The repr gives the primary components defined by the URL.  The subfields are provided as attributes of the result.  This is documented in the example at the top of the chapter, but it is not, IMO, well documented in the rest of the chapter.

I'm not sure when this feature was introduced, so I'm leaving 3.1 in the versions for now.
msg123902 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-12-13 20:43
These attributes were added in Python 2.5.

Documentation improvements should be backported to 2.7 and 3.1.
msg123903 - (view) Author: JamesThomasMoon1979 (JTMoon79) Date: 2010-12-13 20:48
Doh!  I feel a bit silly.
I didn't notice 'hostname' and 'port' in 
>>> dir(urllib.parse.urlparse(r'http://foo.bar.com:80/blarg?a=1&b=2'))
[... 'count', 'fragment', 'geturl', 'hostname', 'index'
, 'netloc', 'params', 'password', 'path', 'port', 'query', 'scheme', 'username']

I agree, some clarity in the documentation for these overlapping fields (<net_loc>,<port>,<hostname>) would help.

-J_Tom_Moon_79
msg235583 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 03:27
I don’t understand where the work needs to be done for this one. Even in the 3.1 and 2.7 documentation, the urlparse() and urlsplit() entries both list “port” as one of the returned attributes, and urlparse() has example code for it.
msg270372 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-07-14 05:54
I am unsure of the change too. I am willing to close this report as .port attribute is already documented.
History
Date User Action Args
2016-07-14 05:54:37orsenthilsetstatus: pending -> closed
resolution: not a bug
messages: + msg270372

stage: needs patch -> resolved
2016-07-11 01:22:27martin.pantersetstatus: open -> pending
2015-02-09 03:27:48martin.pantersetnosy: + martin.panter
messages: + msg235583
2010-12-13 20:48:37JTMoon79setmessages: + msg123903
2010-12-13 20:44:25fdrakesetversions: + Python 2.7
2010-12-13 20:43:43fdrakesetnosy: + fdrake
messages: + msg123902
2010-12-13 20:30:44r.david.murraysetassignee: docs@python

components: + Documentation, - Library (Lib)
title: port not split in function urllib.parse.urlparse -> host and port attributes not documented well in function urllib.parse.urlparse and urlsplit
nosy: + docs@python, r.david.murray, orsenthil
versions: + Python 3.2
messages: + msg123901
stage: needs patch
2010-12-13 20:17:18JTMoon79create