This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse.urlparse doesn't check port
Type: Stage: patch review
Components: Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: miguendes, orsenthil, palik, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2021-04-16 17:30 by palik, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 25774 open miguendes, 2021-05-01 09:32
Messages (5)
msg391238 - (view) Author: Alexei Pastuchov (palik) Date: 2021-04-16 17:30
It is possible to get valid ParseResult from the urlparse function even for a non-numeric port value. Only by requesting the port it fails[1].
Would it be an improvement if _checknetloc[2] validates the value of port properly?


// code snippet
Python 3.8.5 (default, Jan 27 2021, 15:41:15) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> uri = 'xx://foo:bar'
>>> uri_parts = urlparse(uri)
>>> uri_parts.netloc
'foo:bar'
>>> uri_parts.port
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/urllib/parse.py", line 174, in port
    raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'bar'
// code snippet

[1] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L172
[2] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L416
msg391242 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2021-04-16 18:00
I guess moving port validation logic to parsing time is done as part of https://github.com/python/cpython/pull/16780
msg391265 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2021-04-16 23:24
Treating this as bug in itself might be a better idea than waiting for a ipv6 scope introduction, which had few caveats. 

> Would it be an improvement if _checknetloc[2] validates the value of port properly?

Yes, we could check if it is an int. That should be sufficient.
msg391282 - (view) Author: Alexei Pastuchov (palik) Date: 2021-04-17 10:45
Thank you for your swift response and your willingness to add port validation to _checknetloc.

I think the validation itself should compound both exceptional branches implemented in port[3]
* port is an int
* port is in the range

[3] https://github.com/python/cpython/blob/e1903e11a3d42512effe336026e0c67f602e5848/Lib/urllib/parse.py#L173-L178
msg392577 - (view) Author: Miguel Brito (miguendes) * Date: 2021-05-01 09:34
I also think the validation logic should be ran as early as possible.

I gave it a shot and implemented it. 

I appreciate any reviews: https://github.com/python/cpython/pull/25774

Got some ideas from https://github.com/python/cpython/pull/16780
History
Date User Action Args
2022-04-11 14:59:44adminsetgithub: 88037
2021-05-01 09:34:26miguendessetmessages: + msg392577
2021-05-01 09:32:38miguendessetkeywords: + patch
nosy: + miguendes

pull_requests: + pull_request24464
stage: patch review
2021-04-17 10:45:19paliksetmessages: + msg391282
2021-04-16 23:24:20orsenthilsetassignee: orsenthil
versions: + Python 3.10
2021-04-16 23:24:04orsenthilsetmessages: + msg391265
2021-04-16 18:00:41xtreaksetnosy: + xtreak, vstinner, orsenthil
messages: + msg391242
2021-04-16 17:30:08palikcreate