Issue 44375: urllib.parse.urlparse is not parsing the url properly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/88541

classification

Title:	urllib.parse.urlparse is not parsing the url properly
Type:		Stage:
Components:		Versions:	Python 3.9

process

Created on 2021-06-10 11:29 by neethun, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg395518 - (view)	Author: Neethu (neethun)	Date: 2021-06-10 11:29
urllib.parse.urlparse is not parsing urls without scheme and with port number properly. from urllib.parse import urlparse print(urlparse("www.cwi.nl:80")) ParseResult(scheme='www.cwi.nl', netloc='', path='80', params='', query='', fragment='') Python version : 3.9.5
msg395522 - (view)	Author: Gnanesh (Gnanesh)	Date: 2021-06-10 11:52
Hey neethu, For empty schemes, it should have a prefix of "//" in the URL to parse it correctly. Try: > urlparse('//www.cwi.nl:80') ParseResult(scheme='', netloc='www.cwi.nl:80', path='', params='', query='', fragment='') Here's a comment from the docs (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse): > Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.