Message375109
tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when they appear in userinfo which will already parse correctly.
---
Following up on what Martin said, RFC 3986 has the specifications for how these examples should be parsed.
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Notably, gen-delims are _not_ included in the allowed characters, nor are non-ASCII characters.
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
These and other characters not mentioned should be URL-encoded/%-encoded if they appear in the password.
Taking the first example:
>>> from urllib.parse import urlparse
>>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html'
>>> urlparse(u)
ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', path='/a/b/c.html', params='', query='', fragment='')
>>> unquote(urlparse(u).password)
'secr#et' |
|
Date |
User |
Action |
Args |
2020-08-10 13:17:32 | david.six | set | recipients:
+ david.six, orsenthil, martin.panter, dmi.baranov, madison.may, anh.le |
2020-08-10 13:17:32 | david.six | set | messageid: <1597065452.82.0.867791381068.issue18140@roundup.psfhosted.org> |
2020-08-10 13:17:32 | david.six | link | issue18140 messages |
2020-08-10 13:17:32 | david.six | create | |
|