Message 375109 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	david.six
Recipients	anh.le, david.six, dmi.baranov, madison.may, martin.panter, orsenthil
Date	2020-08-10.13:17:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1597065452.82.0.867791381068.issue18140@roundup.psfhosted.org>
In-reply-to

Content
tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when they appear in userinfo which will already parse correctly. --- Following up on what Martin said, RFC 3986 has the specifications for how these examples should be parsed. userinfo = ( unreserved / pct-encoded / sub-delims / ":" ) unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "" / "+" / "," / ";" / "=" Notably, gen-delims are _not_ included in the allowed characters, nor are non-ASCII characters. gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" These and other characters not mentioned should be URL-encoded/%-encoded if they appear in the password. Taking the first example: >>> from urllib.parse import urlparse >>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html' >>> urlparse(u) ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', path='/a/b/c.html', params='', query='', fragment='') >>> unquote(urlparse(u).password) 'secr#et'

tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when they appear in userinfo which will already parse correctly.

---

Following up on what Martin said, RFC 3986 has the specifications for how these examples should be parsed.

userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Notably, gen-delims are _not_ included in the allowed characters, nor are non-ASCII characters.

gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"

These and other characters not mentioned should be URL-encoded/%-encoded if they appear in the password.

Taking the first example:

>>> from urllib.parse import urlparse
>>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html'
>>> urlparse(u)
ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', path='/a/b/c.html', params='', query='', fragment='')
>>> unquote(urlparse(u).password)
'secr#et'

History
Date	User	Action	Args
2020-08-10 13:17:32	david.six	set	recipients: + david.six, orsenthil, martin.panter, dmi.baranov, madison.may, anh.le
2020-08-10 13:17:32	david.six	set	messageid: <1597065452.82.0.867791381068.issue18140@roundup.psfhosted.org>
2020-08-10 13:17:32	david.six	link	issue18140 messages
2020-08-10 13:17:32	david.six	create