Title: urllib.parse.urlparse doesn't check port
Created on 2021-04-16 17:30 by palik, last changed 2021-05-01 09:34 by miguendes.

PR 25774 open miguendes, 2021-05-01 09:32
msg391238 - (view) Author: Alexei Pastuchov (palik) Date: 2021-04-16 17:30
It is possible to get valid ParseResult from the urlparse function even for a non-numeric port value. Only by requesting the port it fails[1].
Would it be an improvement if _checknetloc[2] validates the value of port properly?

// code snippet
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> uri = 'xx://foo:bar'
>>> uri_parts = urlparse(uri)
>>> uri_parts.netloc
>>> uri_parts.port
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/urllib/", line 174, in port
    raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'bar'
// code snippet

msg391242 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2021-04-16 18:00
I guess moving port validation logic to parsing time is done as part of
msg391265 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2021-04-16 23:24
Treating this as bug in itself might be a better idea than waiting for a ipv6 scope introduction, which had few caveats. 

> Would it be an improvement if _checknetloc[2] validates the value of port properly?

Yes, we could check if it is an int. That should be sufficient.
msg391282 - (view) Author: Alexei Pastuchov (palik) Date: 2021-04-17 10:45
Thank you for your swift response and your willingness to add port validation to _checknetloc.

I think the validation itself should compound both exceptional branches implemented in port[3]
* port is an int
* port is in the range

msg392577 - (view) Author: Miguel Brito (miguendes) * Date: 2021-05-01 09:34
I also think the validation logic should be ran as early as possible.

I gave it a shot and implemented it. 

I appreciate any reviews:

Got some ideas from
