[security] urllib connects to a wrong host #74685

postmasters · 2017-05-29T04:04:12Z

BPO	30500
Nosy	@vstinner, @larryhastings, @ned-deily, @vadmium, @postmasters, @serhiy-storchaka
PRs	bpo-30500: Fix urllib.parse.splithost() to parse correctly fragments #1849 [2.7] bpo-30500: urllib: Simplify splithost by calling into urlparse. #1850 [3.6] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2289 [3.5] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2290 [3.4] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2291 [3.3] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2292 bpo-30500: Fix the NEWS entry #2293 [3.5] bpo-30500: Fix the NEWS entry #2295 [3.6] bpo-30500: Fix the NEWS entry #2296 [2.7] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2294

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/vstinner'
closed_at = <Date 2017-07-26.02:48:01.344>
created_at = <Date 2017-05-29.04:04:12.437>
labels = ['type-security', '3.7', 'library']
title = '[security] urllib connects to a wrong host'
updated_at = <Date 2019-05-10.17:59:50.407>
user = 'https://github.com/postmasters'

bugs.python.org fields:

activity = <Date 2019-05-10.17:59:50.407>
actor = 'ned.deily'
assignee = 'vstinner'
closed = True
closed_date = <Date 2017-07-26.02:48:01.344>
closer = 'ned.deily'
components = ['Library (Lib)']
creation = <Date 2017-05-29.04:04:12.437>
creator = 'Nam.Nguyen'
dependencies = []
files = []
hgrepos = []
issue_num = 30500
keywords = []
message_count = 20.0
messages = ['294667', '294671', '294694', '295318', '296422', '296426', '296427', '296430', '296431', '296432', '296433', '296436', '296439', '296442', '296446', '296448', '296455', '297939', '298211', '299187']
nosy_count = 6.0
nosy_names = ['vstinner', 'larry', 'ned.deily', 'martin.panter', 'Nam.Nguyen', 'serhiy.storchaka']
pr_nums = ['1849', '1850', '2289', '2290', '2291', '2292', '2293', '2295', '2296', '2294']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue30500'
versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7']

postmasters · 2017-05-29T04:04:12Z

Reported by Orange Tsai:

==========
Hi, Python Security Team

import urllib
from urlparse import urlparse

url = 'http://127.0.0.1#@evil.com/'
print urlparse(url).netloc          # 127.0.0.1
print urllib.urlopen(url).read()    # will access evil.com

I have tested on the latest version of Python 2.7.13.
==========

vadmium · 2017-05-29T05:06:10Z

See also bpo-18140, where it looks like people _want_ the hash (#) to be part of the username and/or password.

Another option may be to raise an exception.

postmasters · 2017-05-29T13:24:15Z

I think the best behavior is to do what popular web browsers do. Chrome and Firefox, for example, parses this is host 127.0.0.1, path /, fragment #@evil.com.

If the code does want to support username/password, it should do a custom opener (with basic HTTP authentication) instead.

vstinner · 2017-06-07T08:30:25Z

I think the best behavior is to do what popular web browsers do. Chrome and Firefox, for example, parses this is host 127.0.0.1, path /, fragment #@evil.com.

I agree that in case of doubt, it's better to follow the implementation of most popular web browser which indirectly define the "standard".

vstinner · 2017-06-20T13:23:26Z

When porting the change to Python 3.4, I found this older change:

     if _hostprog is None:
-        _hostprog = re.compile('^//([^/?]*)(.*)$')
+        _hostprog = re.compile('//([^/?]*)(.*)', re.DOTALL)
 
     match = _hostprog.match(url)
     if match:
-        host_port = match.group(1)
-        path = match.group(2)
-        if path and not path.startswith('/'):
+        host_port, path = match.groups()
+        if path and path[0] != '/':
             path = '/' + path
         return host_port, path

made by:

commit 44eceb6
Author: Serhiy Storchaka <storchaka@gmail.com>
Date: Tue Mar 3 20:21:35 2015 +0200

Issue bpo-23563: Optimized utility functions in urllib.parse.

With this change, newlines are now accepted in URLs.

@serhiy: Was it a deliberate change?

vstinner · 2017-06-20T13:33:37Z

New changeset 4899d84 by Victor Stinner in branch '3.5':
bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2290)
4899d84

vstinner · 2017-06-20T13:37:27Z

New changeset 536c1f1 by Victor Stinner in branch '3.6':
bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2289)
536c1f1

vstinner · 2017-06-20T14:09:08Z

New changeset 82acabd by Victor Stinner in branch '3.6':
bpo-30500: Fix the NEWS entry (bpo-2296)
82acabd

vstinner · 2017-06-20T14:09:09Z

New changeset 4108606 by Victor Stinner in branch '3.5':
bpo-30500: Fix the NEWS entry (bpo-2295)
4108606

vstinner · 2017-06-20T14:09:25Z

New changeset 8457706 by Victor Stinner in branch 'master':
bpo-30500: Fix the NEWS entry (bpo-2293)
8457706

serhiy-storchaka · 2017-06-20T14:11:19Z

Oh, I didn't expected that newlines can be in a host name. In any case if newlines are a problem, it is better to check explicitly whether a host name contains CR, LF or other special characters. And it is better to do such checks when format a request rather than when parse an URL.

vstinner · 2017-06-20T14:20:39Z

New changeset d4324ba by Victor Stinner in branch '2.7':
bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2294)
d4324ba

vstinner · 2017-06-20T14:31:26Z

New changeset d4324ba by Victor Stinner in branch '2.7':

Oh, I was too fast. I wanted to see an agreement on DOTALL before merging this one. I missed that the 2.7 change also added DOTALL.

I had to handle many branches (2.7, 3.3, 3.4, 3.5, 3.6, master), conflicts (especially in 2.7!), and there was a mistake in the NEWS entry (http://... => //...). Sorry, I missed the DOTALL in 2.7 :-p

vstinner · 2017-06-20T14:39:54Z

I tested my system python2 (Python 2.7.13 on Fedora 25):

haypo@selma$ python2
Python 2.7.13 (default, May 10 2017, 20:04:28) 
>>> urllib.splithost('//hostname/url')
('hostname', '/url')
>>> urllib.splithost('//host\nname/url')  # newline in hostname, accepted
('host\nname', '/url')
>>> print(urllib.splithost('//host\nname/url')[0])  # newline in hostname, accepted
host
name
>>> urllib.splithost('//hostname/ur\nl')  # newline in URL, rejected
(None, '//hostname/ur\nl')

=> Newline is accepted in the hostname, but not in the URL path.

With my change (adding DOTALL), newlines are accepted in the hostname and in the URL:

haypo@selma$ ./python
Python 2.7.13+ (heads/2.7:b39a748, Jun 19 2017, 18:07:19) 
>>> import urllib
>>> urllib.splithost("//hostname/url")
('hostname', '/url')
>>> urllib.splithost("//host\nname/url")  # newline in hostname, accepted
('host\nname', '/url')
>>> urllib.splithost("//hostname/ur\nl")  # newline in URL, accepted
('hostname', '/ur\nl')

More generally, it seems like the urllib module doesn't try to validate URL content. It just try to "split" them.

Who is responsible to validate URLs? Is it the responsability of the application developer to implement a whitelist?

serhiy-storchaka · 2017-06-20T14:57:28Z

The urllib package consists of two parts: urllib.parse and urllib.request. I think urllib.request is responsible for making valid requests and validate arguments if needed.

serhiy-storchaka · 2017-06-20T14:58:59Z

Or more low-level modules used by urllib.request: http, ftplib, etc.

vstinner · 2017-06-20T15:26:27Z

I created bpo-30713: "Reject newline character (U+000A) in URLs in urllib.parse", to discuss how to handle newlines in URLs.

ned-deily · 2017-07-08T04:51:41Z

New changeset b0fba88 by Ned Deily (Victor Stinner) in branch '3.6':
bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2289)
b0fba88

larryhastings · 2017-07-12T12:51:48Z

New changeset cc54c1c by larryhastings (Victor Stinner) in branch '3.4':
bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2291)
cc54c1c

ned-deily · 2017-07-26T02:43:55Z

New changeset 052f9d6 by Ned Deily (Victor Stinner) in branch '3.3':
[3.3] bpo-30500: urllib: Simplify splithost by calling into urlparse. (bpo-1849) (bpo-2292)
052f9d6

postmasters mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-security A security issue labels May 29, 2017

vstinner changed the title ~~urllib connects to a wrong host~~ [security] urllib connects to a wrong host Jun 7, 2017

ned-deily closed this as completed Jul 26, 2017

ned-deily assigned vstinner Jul 26, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[security] urllib connects to a wrong host #74685

[security] urllib connects to a wrong host #74685

postmasters mannequin commented May 29, 2017

postmasters mannequin commented May 29, 2017

vadmium commented May 29, 2017

postmasters mannequin commented May 29, 2017

vstinner commented Jun 7, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

ned-deily commented Jul 8, 2017

larryhastings commented Jul 12, 2017

ned-deily commented Jul 26, 2017

[security] urllib connects to a wrong host #74685

[security] urllib connects to a wrong host #74685

Comments

postmasters mannequin commented May 29, 2017

postmasters mannequin commented May 29, 2017

vadmium commented May 29, 2017

postmasters mannequin commented May 29, 2017

vstinner commented Jun 7, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

ned-deily commented Jul 8, 2017

larryhastings commented Jul 12, 2017

ned-deily commented Jul 26, 2017