classification
Title: Incorrect behaviour for user@password URI pattern in urlparse
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Sean.Wang, potomak
Priority: normal Keywords:

Created on 2019-07-25 09:49 by Sean.Wang, last changed 2019-07-29 03:30 by potomak.

Messages (2)
msg348431 - (view) Author: Sean Wang (Sean.Wang) Date: 2019-07-25 09:49
When an IPV4 URL with 'username:password' in it, and the password contains special characters like #[]?, urlparse would act as unexcepted.
example: 

urlparse('http://user:pass#?[word@example.com:80/path')
msg348593 - (view) Author: Giovanni Cappellotto (potomak) * Date: 2019-07-29 03:30
What do you mean that urlparse act as unexpected?

I tried your example and I think urlparse's behavior is correct.

From the RFC 1738:

> Octets must be encoded if they have no corresponding graphic
> character within the US-ASCII coded character set, if the use of the
> corresponding character is unsafe, or if the corresponding character
> is reserved for some other interpretation within the particular URL
> scheme.

Your example:

```
>>> from urllib.parse import urlparse
>>> urlparse('http://user:pass#?[word@example.com:80/path')
ParseResult(scheme='http', netloc='user:pass', path='', params='', query='', fragment='?[word@example.com:80/path')
```

Part of the password is parsed as the URL fragment because the character `#` has a special meaning:

> The character "#" is unsafe and should
> always be encoded because it is used in World Wide Web and in other
> systems to delimit a URL from a fragment/anchor identifier that might
> follow it.
History
Date User Action Args
2019-07-29 03:30:11potomaksetnosy: + potomak
messages: + msg348593
2019-07-25 09:49:26Sean.Wangcreate