classification
Title: urllib IPv6 parsing fails with special characters in passwords
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benaryorg, martin.panter, metaperl, tjollans, vstinner
Priority: normal Keywords:

Created on 2018-04-23 13:44 by benaryorg, last changed 2019-10-15 17:08 by vstinner.

Messages (7)
msg315668 - (view) Author: benaryorg (benaryorg) Date: 2018-04-23 13:44
The documentation specifies to follow RFC 2396 (https://tools.ietf.org/html/rfc2396.html) but fails to parse a user:password@host url in urllib.parse.urlsplit (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) when the password contains an '[' character.
This is because the urlsplit code does not strip the authority part (everything from index 0 up to and including the last '@') before checking whether the hostname contains '[' for detecting whether it's an IPv6 address (https://github.com/python/cpython/blob/8a6f4b4bba950fb8eead1b176c58202d773f2f70/Lib/urllib/parse.py#L416-L418).
msg317119 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-05-19 13:49
I presume this is about parsing a URL like

>>> urlsplit("//user:[@host")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/proj/python/cpython/Lib/urllib/parse.py", line 431, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

Ideally the square bracket should be escaped as %5B. Related reports about parsing unescaped delimiters in a URL password are Issue 18140 (fragment #, query ?) and Issue 23328 (slash /).
msg327239 - (view) Author: Thomas Jollans (tjollans) Date: 2018-10-06 09:43
RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part.

So I'd say that the behaviour is arguably correct (if somewhat unfortunate)
msg334273 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-23 21:37
I would like to add to this bug - the password field on the URL cannot contain a pound sign or question mark or the parser incorrectly parses the URL, as this gist demonstrates - https://gist.github.com/metaperl/fc6f43bf6b9a9f874b8f27e29695e68c
msg334302 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-24 15:55
Also note, if SQLAlchemy gives any guidance, then note that SA unquotes both the username and password of the URL:

https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/url.py#L274
msg334303 - (view) Author: Terrence Brannon (metaperl) Date: 2019-01-24 15:59
Regarding "RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part.

So I'd say that the behaviour is arguably correct (if somewhat unfortunate)"

I would say that a square bracket CAN be used in the password, but that it should be urlencoded and that this library should perform a urldecode for both username and password, just as SQLAlchemy does.
msg354745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-15 17:08
I modified my PR 16780 to also fix this issue, my PR was written for bpo-36338.
History
Date User Action Args
2019-10-15 17:08:55vstinnersetmessages: + msg354745
2019-10-15 16:24:08xtreaksetnosy: + vstinner
2019-01-24 15:59:49metaperlsetmessages: + msg334303
2019-01-24 15:55:55metaperlsetmessages: + msg334302
2019-01-23 21:37:03metaperlsetnosy: + metaperl
messages: + msg334273
2018-10-06 09:43:45tjollanssetnosy: + tjollans
messages: + msg327239
2018-05-19 13:49:36martin.pantersetnosy: + martin.panter
messages: + msg317119
2018-04-23 13:44:45benaryorgsettype: behavior
2018-04-23 13:44:30benaryorgcreate