New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[security] urllib connects to a wrong host #74685
Comments
Reported by Orange Tsai: ========== import urllib
from urlparse import urlparse
url = 'http://127.0.0.1#@evil.com/'
print urlparse(url).netloc # 127.0.0.1
print urllib.urlopen(url).read() # will access evil.com I have tested on the latest version of Python 2.7.13. |
See also bpo-18140, where it looks like people _want_ the hash (#) to be part of the username and/or password. Another option may be to raise an exception. |
I think the best behavior is to do what popular web browsers do. Chrome and Firefox, for example, parses this is host 127.0.0.1, path /, fragment #@evil.com. If the code does want to support username/password, it should do a custom opener (with basic HTTP authentication) instead. |
I agree that in case of doubt, it's better to follow the implementation of most popular web browser which indirectly define the "standard". |
When porting the change to Python 3.4, I found this older change: if _hostprog is None:
- _hostprog = re.compile('^//([^/?]*)(.*)$')
+ _hostprog = re.compile('//([^/?]*)(.*)', re.DOTALL)
match = _hostprog.match(url)
if match:
- host_port = match.group(1)
- path = match.group(2)
- if path and not path.startswith('/'):
+ host_port, path = match.groups()
+ if path and path[0] != '/':
path = '/' + path
return host_port, path made by: commit 44eceb6
With this change, newlines are now accepted in URLs. @serhiy: Was it a deliberate change? |
Oh, I didn't expected that newlines can be in a host name. In any case if newlines are a problem, it is better to check explicitly whether a host name contains CR, LF or other special characters. And it is better to do such checks when format a request rather than when parse an URL. |
Oh, I was too fast. I wanted to see an agreement on DOTALL before merging this one. I missed that the 2.7 change also added DOTALL. I had to handle many branches (2.7, 3.3, 3.4, 3.5, 3.6, master), conflicts (especially in 2.7!), and there was a mistake in the NEWS entry (http://... => //...). Sorry, I missed the DOTALL in 2.7 :-p |
I tested my system python2 (Python 2.7.13 on Fedora 25): haypo@selma$ python2
Python 2.7.13 (default, May 10 2017, 20:04:28)
>>> urllib.splithost('//hostname/url')
('hostname', '/url')
>>> urllib.splithost('//host\nname/url') # newline in hostname, accepted
('host\nname', '/url')
>>> print(urllib.splithost('//host\nname/url')[0]) # newline in hostname, accepted
host
name
>>> urllib.splithost('//hostname/ur\nl') # newline in URL, rejected
(None, '//hostname/ur\nl') => Newline is accepted in the hostname, but not in the URL path. With my change (adding DOTALL), newlines are accepted in the hostname and in the URL: haypo@selma$ ./python
Python 2.7.13+ (heads/2.7:b39a748, Jun 19 2017, 18:07:19)
>>> import urllib
>>> urllib.splithost("//hostname/url")
('hostname', '/url')
>>> urllib.splithost("//host\nname/url") # newline in hostname, accepted
('host\nname', '/url')
>>> urllib.splithost("//hostname/ur\nl") # newline in URL, accepted
('hostname', '/ur\nl') More generally, it seems like the urllib module doesn't try to validate URL content. It just try to "split" them. Who is responsible to validate URLs? Is it the responsability of the application developer to implement a whitelist? |
The urllib package consists of two parts: urllib.parse and urllib.request. I think urllib.request is responsible for making valid requests and validate arguments if needed. |
Or more low-level modules used by urllib.request: http, ftplib, etc. |
I created bpo-30713: "Reject newline character (U+000A) in URLs in urllib.parse", to discuss how to handle newlines in URLs. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: