This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse() does not handle URLs with port numbers properly
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: ajaksu2, facundobatista, gawain, georg.brandl, orsenthil, tzot
Priority: normal Keywords:

Created on 2008-02-26 22:31 by gawain, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg63054 - (view) Author: Gawain Bolton (gawain) Date: 2008-02-26 22:31
The urlparse() function in urlparse module does not handle URLs without 
an explicit scheme and with port numbers.

The following works as expected:
>>> urlparse.urlparse('foo.bar.com','http').scheme
'http'

But if the URL has a port number then the scheme is wrong:
>>> urlparse.urlparse('foo.bar.com:8080','http').scheme
'foo.bar.com'

I have read RFC 1808 and its description of the parsing of the scheme 
also has this bug.  From what I can figure, the parsing algorithm needs 
to search for the scheme before the substring "://" and not just ":".
msg63070 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2008-02-27 11:31
RFC1808 §2.1 suggests a generic RL syntax that specifies '://' as the
separator, so Gawain's suggestion makes practical sense. However, also
as Gawain says, the RFC specifies that '//' is considered as the first
part of a "net_path" and is not necessarily included (example:
"mailto:tzot@sil-tec.gr" (and yes, I actually welcome spammers :) ).

I believe that urlparse should stay as-is when not called with a
default_scheme argument, and fixed as-suggested when called with a
default_scheme argument (that's the point for providing default_scheme).
msg63072 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-02-27 15:41
I don't think this is a valid issue. If you browse through the RFC 1808
you will find that.
1) For net_loc information it refers to a broad section 1738 and we wont
specifically find any information on port number in that.
2) But, have a look at the BNF Representation of the net_loc

net_loc     =  *( pchar | ";" | "?" )
pchar       = uchar | ":" | "@" | "&" | "="

There it dismisses this issue.

The port number is a property of the scheme in the absolute URL
notation. so, urlparse.urlparse('foo.bar.com',8088).scheme would give
you the port.

If someone can validate my reasoning, then we can close this issue.
msg63150 - (view) Author: Gawain Bolton (gawain) Date: 2008-02-29 23:20
On the contrary, RFC 1738 does mention the port number in section 3.1. 
Common Internet Scheme Syntax:

   While the syntax for the rest of the URL may vary depending on the
   particular scheme selected, URL schemes that involve the direct use
   of an IP-based protocol to a specified host on the Internet use a
   common syntax for the scheme-specific data:

        //<user>:<password>@<host>:<port>/<url-path>

   Some or all of the parts "<user>:<password>@", ":<password>",
   ":<port>", and "/<url-path>" may be excluded.  The scheme specific
   data start with a double slash "//" to indicate that it complies with
   the common Internet scheme syntax.

I agree with Christos Georgiou's suggestion that if the a default 
scheme is passed AND the default scheme is a URL scheme, then the 
scheme should be identified as being before "://".
msg65425 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2008-04-12 22:02
Discussed in #754016
msg69094 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-07-02 14:32
Duplicate of the #754016 one.
History
Date User Action Args
2022-04-11 14:56:31adminsetgithub: 46448
2008-07-02 14:32:36facundobatistasetstatus: open -> closed
nosy: + facundobatista
resolution: duplicate
messages: + msg69094
2008-04-12 22:02:30ajaksu2setnosy: + ajaksu2
messages: + msg65425
versions: + Python 2.6, - Python 2.5
2008-03-20 04:54:27jafosetpriority: normal
assignee: georg.brandl
nosy: + georg.brandl
2008-02-29 23:20:59gawainsetmessages: + msg63150
2008-02-27 15:41:38orsenthilsetnosy: + orsenthil
messages: + msg63072
2008-02-27 11:31:42tzotsetnosy: + tzot
messages: + msg63070
2008-02-26 22:32:45gawainsettitle: urlparse() -> urlparse() does not handle URLs with port numbers properly
2008-02-26 22:31:52gawaincreate