Issue 14036: urlparse insufficient port property validation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58244

classification

Title:	urlparse insufficient port property validation
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.2, Python 3.3, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	orsenthil	Nosy List:	ezio.melotti, martin.panter, ncoghlan, orsenthil, python-dev, r.david.murray, zulla
Priority:	normal	Keywords:	patch

Created on 2012-02-16 23:40 by zulla, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
urlparse.py	zulla, 2012-02-17 01:54
urlparse.diff	zulla, 2012-02-17 02:18

Messages (18)
msg153512 - (view)	Author: zulla (zulla)	Date: 2012-02-16 23:40
The "port" component of a URL is not properly be sanitized or validated. This may lead to the evasion of netloc/hostname based filters or exceptions.
msg153522 - (view)	Author: zulla (zulla)	Date: 2012-02-17 01:27
The "port" and "netloc" component of a ParsedResult-object is not properly sanitized or validated. This may lead to bypass-able hostname-based filters. Remote Crash vulnerabilities be be also possible.
msg153523 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-02-17 01:45
Did you upload urlparse.py to the issue by accident? Can you please provide some examples of where you think the current code is producing incorrect results?
msg153524 - (view)	Author: zulla (zulla)	Date: 2012-02-17 01:47
Hi. No, it's a patched version. It won't crash under circumstances like that [1] and won't succeed with invalid input: >>> import urlparse >>> urlparse.urlparse("http://www.google.com:foo") ParseResult(scheme='http', netloc='www.google.com:foo', path='', params='', query='', fragment='') >>> urlparse.urlparse("http://www.google.com:foo").port Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 105, in port port = int(netloc.split(':')[1], 10) ValueError: invalid literal for int() with base 10: 'foo' >>>
msg153525 - (view)	Author: zulla (zulla)	Date: 2012-02-17 01:54
Whops. I forgot an int() :-) Here's the right patch.
msg153527 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-02-17 02:05
It's not a patch if it is the whole file. A diff would be much more useful, since then we could see the changes easily. This kind of change would require a bit of discussion. I'm doubtful that it would be applied as a bug fix, and we might even want the validation to be optional and not the default. Part of the issue is that urlparse was originally based on the older standards, as you can see from the docstring. You may find others to agree with you, but personally it doesn't look to me like this rises to the level of a security issue.
msg153528 - (view)	Author: zulla (zulla)	Date: 2012-02-17 02:18
I understand your point of view, but I disagree. Various libraries and projects rely on urlparse.urlparse and urllib.parse.urlparse. This bug just blew up in my face. I'm working with Cython and PyQt4. When a developer relies on ParseResult().netloc being a valid netloc, and .port being None [bool(False)] or a integer between 1-65535 really bad things can happen in a environment that has 0-tolerance for security issues (like C/C++ mixed in python). I agree that the if self.scheme == "http": return 80 elif self.scheme == "https": [...] part of my patch is debetable, but we should _at least_ ensure that IF there is a ParseResult().port, the developer can be sure that it is a valid port between 1-65545. i apologize for upload the whole file; i attached the diff now. regards, dan
msg153545 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2012-02-17 13:58
Could you provide some failing examples? The suggestion also seems to run slightly at odds with itself - in one part, silently replacing an invalid port specification with a different value, in another adding additional validation checks. Also, rather than hardcoding default port numbers for selected protocols, it would make more sense to just look them up via socket.getservbyname(scheme) (and still return None if the scheme isn't recognised). However, I'll need to think about that one for a while - urlparse is designed to be almost purely a string parsing library. Looking up default port numbers if one isn't specified really isn't its job. (For one thing, you'd break the ability to exactly recreate the original URL text from the parsed version). There should definitely by a try/except around that conversion to int(), though - it's just that I'm not yet sure what an appropriate return value is at that point. The raw port string? None? Should there be a separate "raw_port" descriptor that always returns some kind of string, with the port descriptor promising to always return a valid 16-bit port number or None?
msg154833 - (view)	Author: zulla (zulla)	Date: 2012-03-03 12:38
>>> u("http://www.google.com:99999999999999999999999999").port 99999999999999999999999999L
msg155204 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2012-03-09 03:07
Couple of points: 1. On your last example, which webserver treats 'L' as part of port number? I can't of anything. 2. Can you write a "real application" which is listening to beyond 65535? Which platform would it be? Current way of handling invalid port like, int('foo') by raising ValueError seems to be a better than returning a None. A better error message could be desirable, but that does not make it a security issue. Additionally, for the example of int changing long integer to 'L' appended one would a 2.x case as it is no longer the behavior in 3.x Also, I would advice to look at getPort function in a C library or a Java library and see what it does. The only difference I see is, they return -1 where Python returns None. I am changing the request type to an enhancement, because there is not a valid argument to support that it is a security issue.
msg161275 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2012-05-21 14:51
I am not sure if anything should be done to this request. Saying that int("99999999999999999999999999",10) is converting to 99999999999999999999999999L in Python2.7 it is a bug/security issue is hypothetical. Practically, such high port numbers cannot exist. I suggest, we close this issue as won't fix. Ezio, I noticed that you changed from pending to open. If you had any ideas for this issue, please share, otherwise you can consider closing this too. Thanks!
msg161276 - (view)	Author: zulla (zulla)	Date: 2012-05-21 15:06
Your comment is completely senseless, sorry. Of course such high port numbers do not exist. An attacker is counting on that. Imagine something like that pass_to_cython(urlparse("http://google.de:999999**999999[to be calculated]").port)
msg161277 - (view)	Author: zulla (zulla)	Date: 2012-05-21 15:09
we should at least check if the .port attribute is an intereger >= 1 and <= 65535. _because_ this is the only valid port range. otherwise, it is no valid port. but it may be a integer overflow attack attempt when a developer uses .port, he is counting on the result being valid
msg161278 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2012-05-21 15:23
pass_to_cython(urlparse("http://google.de:999999999999[to be calculated]").port) is no different than sending pass_to_cython(999999999999[to be calculated]) In that case, would the former make a security loop hole in urlparse? Looks pretty contrived to me as an example for .port bug. However, I agree with one point in your assertion, namely that port be checked that it is within the range integer >= 1 and <= 65535. If it is not, return None as a response in port.
msg161279 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-05-21 15:32
> Ezio, I noticed that you changed from pending to open. That was an accident, I just meant to add my self to the nosy. I didn't have time yet to read all the messages and comment on the issue.
msg161506 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-05-24 13:57
New changeset 988903cf24c5 by Senthil Kumaran in branch '2.7': Issue #14036: return None when port in urlparse cross 65535 http://hg.python.org/cpython/rev/988903cf24c5 New changeset d769e64aed79 by Senthil Kumaran in branch '3.2': Issue #14036: return None when port in urlparse cross 65535 http://hg.python.org/cpython/rev/d769e64aed79 New changeset b4d257c64db7 by Senthil Kumaran in branch 'default': Issue #14036: return None when port in urlparse cross 65535 http://hg.python.org/cpython/rev/b4d257c64db7
msg161507 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2012-05-24 13:59
This is taken care. I was not really convinced on the need as likely seemed a non issue from "urlparse" standpoint, But still there is no harm in returning valid port as semantically the attribute stands for a port. Thanks!
msg235661 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-10 03:16
See Issue 20059 proposing to change this to raise ValueError

History
Date	User	Action	Args
2022-04-11 14:57:26	admin	set	github: 58244
2015-02-10 03:16:40	martin.panter	set	nosy: + martin.panter messages: + msg235661
2012-05-24 13:59:22	orsenthil	set	status: open -> closed type: enhancement -> behavior messages: + msg161507 assignee: orsenthil resolution: fixed stage: resolved
2012-05-24 13:57:55	python-dev	set	nosy: + python-dev messages: + msg161506
2012-05-21 15:32:01	ezio.melotti	set	messages: + msg161279
2012-05-21 15:23:16	orsenthil	set	messages: + msg161278
2012-05-21 15:09:21	zulla	set	messages: + msg161277
2012-05-21 15:06:40	zulla	set	messages: + msg161276
2012-05-21 14:51:03	orsenthil	set	messages: + msg161275
2012-05-19 14:17:33	ezio.melotti	set	status: pending -> open nosy: + ezio.melotti
2012-03-09 03:07:01	orsenthil	set	status: open -> pending type: security -> enhancement messages: + msg155204
2012-03-03 12:38:43	zulla	set	messages: + msg154833
2012-02-17 13:58:32	ncoghlan	set	nosy: + ncoghlan messages: + msg153545 versions: - Python 3.1, Python 3.4
2012-02-17 02:18:54	zulla	set	files: + urlparse.diff keywords: + patch messages: + msg153528
2012-02-17 02:05:52	r.david.murray	set	nosy: + orsenthil messages: + msg153527
2012-02-17 01:54:33	zulla	set	files: - urlparse.py
2012-02-17 01:54:09	zulla	set	files: + urlparse.py messages: + msg153525
2012-02-17 01:47:03	zulla	set	messages: + msg153524
2012-02-17 01:45:08	r.david.murray	set	nosy: + r.david.murray messages: + msg153523
2012-02-17 01:27:49	zulla	set	messages: + msg153522
2012-02-16 23:40:38	zulla	create