Issue 37678: Incorrect behaviour for user@password URI pattern in urlparse

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81859

classification

Title:	Incorrect behaviour for user@password URI pattern in urlparse
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.7, Python 3.6, Python 3.5, Python 2.7

process

Created on 2019-07-25 09:49 by Sean.Wang, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg348431 - (view)	Author: Sean Wang (Sean.Wang)	Date: 2019-07-25 09:49
When an IPV4 URL with 'username:password' in it, and the password contains special characters like #[]?, urlparse would act as unexcepted. example: urlparse('http://user:pass#?[word@example.com:80/path')
msg348593 - (view)	Author: Giovanni Cappellotto (potomak) *	Date: 2019-07-29 03:30
What do you mean that urlparse act as unexpected? I tried your example and I think urlparse's behavior is correct. From the RFC 1738: > Octets must be encoded if they have no corresponding graphic > character within the US-ASCII coded character set, if the use of the > corresponding character is unsafe, or if the corresponding character > is reserved for some other interpretation within the particular URL > scheme. Your example: ``` >>> from urllib.parse import urlparse >>> urlparse('http://user:pass#?[word@example.com:80/path') ParseResult(scheme='http', netloc='user:pass', path='', params='', query='', fragment='?[word@example.com:80/path') ``` Part of the password is parsed as the URL fragment because the character `#` has a special meaning: > The character "#" is unsafe and should > always be encoded because it is used in World Wide Web and in other > systems to delimit a URL from a fragment/anchor identifier that might > follow it.