classification
Title: urlparse of urllib returns wrong hostname
Type: security Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Xianbo Wang, martin.panter, matrixise, orsenthil, ronaldoussoren, xtreak
Priority: normal Keywords: patch

Created on 2019-03-18 08:06 by Xianbo Wang, last changed 2019-05-15 12:01 by inada.naoki.

Files
File name Uploaded Description Edit
test_bug_36338.py matrixise, 2019-03-18 08:26 Unittest for this issue.
Messages (6)
msg338171 - (view) Author: Xianbo Wang (Xianbo Wang) Date: 2019-03-18 08:06
The urlparse function in Python urllib returns the wrong hostname when parsing URL crafted by the malicious user. This may be caused by incorrect handling of IPv6 addresses. The bug could lead to open redirect in web applications which rely on urlparse to extract and validate the domain of redirection URL.

The test case is as follows:

>>> from urllib.parse import urlparse
>>> urlparse(urlparse('http://benign.com\[attacker.com]').hostname
>>> 'attacker.com'

The correct behavior should be raising an invalid URL exception.
msg338172 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-03-18 08:17
I can confirm with 3.7.2 on fedora 29
msg338173 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-03-18 08:26
Here is a unittest where you can test this issue and the result on Python 3.8.0a2 and 3.7.2

>>> 3.8.0a2
./python /tmp/test_bug_36338.py
/tmp/test_bug_36338.py:8: SyntaxWarning: invalid escape sequence \[
  url = 'http://demo.com\[attacker.com]'
3.8.0a2+ (heads/master:23581c018f, Mar 18 2019, 09:17:05) 
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
test_bad_url (__main__.TestUrlparse) ... FAIL

======================================================================
FAIL: test_bad_url (__main__.TestUrlparse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/test_bug_36338.py", line 13, in test_bad_url
    self.assertEqual(hostname, expected_hostname)
AssertionError: 'attacker.com' != 'demo.com'
- attacker.com
+ demo.com


----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

>>> 3.7.2
python /tmp/test_bug_36338.py
3.7.2 (default, Jan 16 2019, 19:49:22) 
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)]
test_bad_url (__main__.TestUrlparse) ... FAIL

======================================================================
FAIL: test_bad_url (__main__.TestUrlparse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/test_bug_36338.py", line 13, in test_bad_url
    self.assertEqual(hostname, expected_hostname)
AssertionError: 'attacker.com' != 'demo.com'
- attacker.com
+ demo.com


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)
msg338599 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-03-22 11:27
See also issue20271 that discusses the other format http://[::1]spam where ::1 is returned as hostname. urlparse tries to parse the hostname as IPV6 address when there is [ and parses till ] at [0] thus "benign.com\[attacker.com]" is treated as a URL where attacker.com is assumed as the IPV6 hostname. I am not sure of the correct behavior. FWIW at least Java and golang return "benign.com[attacker.com]" and Ruby raises an exception that this is a bad URL.

Java

> (.getHost (java.net.URL. "http://benign.com\\[attacker.com]"))
"benign.com\\[attacker.com]"

golang: https://play.golang.org/p/q8pTo9ySLby


[0] https://github.com/python/cpython/blob/c5c6cdada3d41148bdeeacfe7528327b481c5d18/Lib/urllib/parse.py#L199
msg338960 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-03-27 13:33
Given a quick scan of RFC 3986[1] I'd say that the behaviour of Ruby seems to be the most correct. That said, I'd also check what the major browsers do in this case (FWIW both FF and Safari use 'benign.com' as the hostname in this case).


[1] https://tools.ietf.org/html/rfc3986#page-17
msg338972 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-03-27 16:00
I found this page to be uesful : https://url.spec.whatwg.org/#host-parsing and following the steps it seems that this should raise an error since at the 7th step it denotes that asciiDomain shouldn't contain forbidden host code point including "[]" . As another data point using 'new URL("http://benign.com[attacker.com]")' in browser's Javascript console also raises exception that this is a bad URL. Even if attacker.com is assumed to be the correct host by Python it's not validated to be an IPV6 address where it should fail.

Ruby seems to use a regex : https://github.com/ruby/ruby/blob/trunk/lib/uri/rfc3986_parser.rb#L6
Java parseurl : http://hg.openjdk.java.net/jdk/jdk/file/c4c225b49c5f/src/java.base/share/classes/java/net/URLStreamHandler.java#l124
golang : https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/net/url/url.go#L587
History
Date User Action Args
2019-05-15 12:01:19inada.naokisetpull_requests: - pull_request13146
2019-05-10 18:42:37pierreglasersetpull_requests: + pull_request13146
2019-03-27 16:00:26xtreaksetmessages: + msg338972
2019-03-27 13:57:01xtreaksetpull_requests: - pull_request12526
2019-03-27 13:33:00ronaldoussorensetnosy: + ronaldoussoren
messages: + msg338960
2019-03-27 10:17:02pierreglasersetpull_requests: + pull_request12526
2019-03-27 10:15:52xtreaksetpull_requests: - pull_request12525
2019-03-27 10:13:09pierreglasersetstage: patch review
pull_requests: + pull_request12525
2019-03-22 11:27:43xtreaksetnosy: + xtreak

messages: + msg338599
stage: patch review -> (no value)
2019-03-21 15:11:58xtreaksetpull_requests: - pull_request12435
2019-03-21 15:09:27pierreglasersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12435
2019-03-18 08:54:34matrixisesetversions: + Python 3.8
2019-03-18 08:34:57xtreaksetnosy: + martin.panter
2019-03-18 08:26:29matrixisesetfiles: + test_bug_36338.py

messages: + msg338173
2019-03-18 08:17:10matrixisesetnosy: + matrixise, orsenthil
messages: + msg338172
2019-03-18 08:06:11Xianbo Wangcreate