classification
Title: urllib.parse.urlparse is not parsing the url properly
Type: Stage:
Components: Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Gnanesh, neethun
Priority: normal Keywords:

Created on 2021-06-10 11:29 by neethun, last changed 2021-06-10 11:52 by Gnanesh.

Messages (2)
msg395518 - (view) Author: Neethu (neethun) Date: 2021-06-10 11:29
urllib.parse.urlparse is not parsing urls without scheme and with port number properly.

from urllib.parse import urlparse
print(urlparse("www.cwi.nl:80"))

ParseResult(scheme='www.cwi.nl', netloc='', path='80', params='', query='', fragment='')

Python version : 3.9.5
msg395522 - (view) Author: Gnanesh (Gnanesh) Date: 2021-06-10 11:52
Hey neethu,

For empty schemes, it should have a prefix of "//" in the URL to parse it correctly.

Try:
> urlparse('//www.cwi.nl:80')

ParseResult(scheme='', netloc='www.cwi.nl:80', path='', params='', query='', fragment='')


Here's a comment from the docs (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse): 
> Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
History
Date User Action Args
2021-06-10 11:52:24Gnaneshsetnosy: + Gnanesh
messages: + msg395522
2021-06-10 11:29:34neethuncreate