This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse insufficient port property validation
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: ezio.melotti, martin.panter, ncoghlan, orsenthil, python-dev, r.david.murray, zulla
Priority: normal Keywords: patch

Created on 2012-02-16 23:40 by zulla, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urlparse.py zulla, 2012-02-17 01:54
urlparse.diff zulla, 2012-02-17 02:18
Messages (18)
msg153512 - (view) Author: zulla (zulla) Date: 2012-02-16 23:40
The "port" component of a URL is not properly be sanitized or validated. This may lead to the evasion of netloc/hostname based filters or exceptions.
msg153522 - (view) Author: zulla (zulla) Date: 2012-02-17 01:27
The "port" and "netloc" component of a ParsedResult-object is not properly sanitized or validated. This may lead to bypass-able hostname-based filters. Remote Crash vulnerabilities be be also possible.
msg153523 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-02-17 01:45
Did you upload urlparse.py to the issue by accident?

Can you please provide some examples of where you think the current code is producing incorrect results?
msg153524 - (view) Author: zulla (zulla) Date: 2012-02-17 01:47
Hi. No, it's a patched version. It won't crash under circumstances like that [1] and won't succeed with invalid input:


>>> import urlparse
>>> urlparse.urlparse("http://www.google.com:foo")
ParseResult(scheme='http', netloc='www.google.com:foo', path='', params='', query='', fragment='')
>>> urlparse.urlparse("http://www.google.com:foo").port
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 105, in port
    port = int(netloc.split(':')[1], 10)
ValueError: invalid literal for int() with base 10: 'foo'
>>>
msg153525 - (view) Author: zulla (zulla) Date: 2012-02-17 01:54
Whops. I forgot an int() :-)

Here's the right patch.
msg153527 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-02-17 02:05
It's not a patch if it is the whole file.  A diff would be much more useful, since then we could see the changes easily.

This kind of change would require a bit of discussion.  I'm doubtful that it would be applied as a bug fix, and we might even want the validation to be optional and not the default.  Part of the issue is that urlparse was originally based on the older standards, as you can see from the docstring.

You may find others to agree with you, but personally it doesn't look to me like this rises to the level of a security issue.
msg153528 - (view) Author: zulla (zulla) Date: 2012-02-17 02:18
I understand your point of view, but I disagree.

Various libraries and projects rely on urlparse.urlparse and urllib.parse.urlparse.

This bug just blew up in my face. I'm working with Cython and PyQt4.

When a developer relies on ParseResult().netloc being a valid netloc, and .port being None [bool(False)] or a integer between 1-65535 really bad things can happen in a environment that has 0-tolerance for security issues (like C/C++ mixed in python).

I agree that the 

if self.scheme == "http":
    return 80
elif self.scheme == "https":
    [...]

part of my patch is debetable, but we should _at least_ ensure that IF there is a ParseResult().port, the developer can be sure that it is a valid port between 1-65545.

i apologize for upload the whole file; i attached the diff now.

regards,
dan
msg153545 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2012-02-17 13:58
Could you provide some failing examples?

The suggestion also seems to run slightly at odds with itself - in one part, silently replacing an invalid port specification with a different value, in another adding additional validation checks.

Also, rather than hardcoding default port numbers for selected protocols, it would make more sense to just look them up via socket.getservbyname(scheme) (and still return None if the scheme isn't recognised). However, I'll need to think about that one for a while - urlparse is designed to be almost purely a string *parsing* library. Looking up default port numbers if one isn't specified really isn't its job. (For one thing, you'd break the ability to exactly recreate the original URL text from the parsed version).

There should definitely by a try/except around that conversion to int(), though - it's just that I'm not yet sure what an appropriate return value is at that point. The raw port string? None? Should there be a separate "raw_port" descriptor that always returns some kind of string, with the port descriptor promising to always return a valid 16-bit port number or None?
msg154833 - (view) Author: zulla (zulla) Date: 2012-03-03 12:38
>>> u("http://www.google.com:99999999999999999999999999").port
99999999999999999999999999L
msg155204 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-09 03:07
Couple of points:

1. On your last example, which webserver treats 'L' as part of port number? I can't of anything.

2. Can you write a "real application" which is listening to beyond 65535? Which platform would it be?

Current way of handling invalid port like, int('foo') by raising ValueError seems to be a better than returning a None.  A better error message could be desirable, but that does not make it a security issue.

Additionally, for the example of int changing long integer to 'L' appended one would a 2.x case as it is no longer the behavior in 3.x

Also, I would advice to look at getPort function in a C library or a Java library and see what it does. The only difference I see is, they return -1 where Python returns None.

I am changing the request type to an enhancement, because there is not a valid argument to support that it is a security issue.
msg161275 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-05-21 14:51
I am not sure if anything should be done to this request. Saying that 
int("99999999999999999999999999",10) is converting to 99999999999999999999999999L in Python2.7 it is a bug/security issue is hypothetical. Practically, such high port numbers cannot exist.

I suggest, we close this issue as won't fix. 

Ezio, I noticed that you changed from pending to open. If you had any ideas for this issue, please share, otherwise you can consider closing this too. 

Thanks!
msg161276 - (view) Author: zulla (zulla) Date: 2012-05-21 15:06
Your comment is completely senseless, sorry.
Of course such high port numbers do not exist.

An attacker is counting on that. Imagine something like that

pass_to_cython(urlparse("http://google.de:999999**999999[to be calculated]").port)
msg161277 - (view) Author: zulla (zulla) Date: 2012-05-21 15:09
we should at least check if the .port attribute is an intereger >= 1 and <= 65535. _because_ this is the only valid port range. otherwise, it is no valid port. but it may be a integer overflow attack attempt

when a developer uses .port, he is counting on the result being valid
msg161278 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-05-21 15:23
pass_to_cython(urlparse("http://google.de:999999**999999[to be calculated]").port)

is no different than sending

pass_to_cython(999999**999999[to be calculated])

In that case, would the former make a security loop hole in urlparse? Looks pretty contrived to me as an example for .port bug. 

However, I agree with one point in your assertion, namely that port be checked that it is within the range integer >= 1 and <= 65535. If it is not, return None as a response in port.
msg161279 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-05-21 15:32
> Ezio, I noticed that you changed from pending to open.

That was an accident, I just meant to add my self to the nosy.
I didn't have time yet to read all the messages and comment on the issue.
msg161506 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-05-24 13:57
New changeset 988903cf24c5 by Senthil Kumaran in branch '2.7':
Issue #14036: return None when port in urlparse cross 65535
http://hg.python.org/cpython/rev/988903cf24c5

New changeset d769e64aed79 by Senthil Kumaran in branch '3.2':
Issue #14036: return None when port in urlparse cross 65535
http://hg.python.org/cpython/rev/d769e64aed79

New changeset b4d257c64db7 by Senthil Kumaran in branch 'default':
Issue #14036: return None when port in urlparse cross 65535
http://hg.python.org/cpython/rev/b4d257c64db7
msg161507 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-05-24 13:59
This is taken care. I was not really convinced on the need as likely seemed a non issue from "urlparse" standpoint, But still there is no harm in returning valid port as semantically the attribute stands for a port.

Thanks!
msg235661 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 03:16
See Issue 20059 proposing to change this to raise ValueError
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58244
2015-02-10 03:16:40martin.pantersetnosy: + martin.panter
messages: + msg235661
2012-05-24 13:59:22orsenthilsetstatus: open -> closed
type: enhancement -> behavior
messages: + msg161507

assignee: orsenthil
resolution: fixed
stage: resolved
2012-05-24 13:57:55python-devsetnosy: + python-dev
messages: + msg161506
2012-05-21 15:32:01ezio.melottisetmessages: + msg161279
2012-05-21 15:23:16orsenthilsetmessages: + msg161278
2012-05-21 15:09:21zullasetmessages: + msg161277
2012-05-21 15:06:40zullasetmessages: + msg161276
2012-05-21 14:51:03orsenthilsetmessages: + msg161275
2012-05-19 14:17:33ezio.melottisetstatus: pending -> open
nosy: + ezio.melotti
2012-03-09 03:07:01orsenthilsetstatus: open -> pending
type: security -> enhancement
messages: + msg155204
2012-03-03 12:38:43zullasetmessages: + msg154833
2012-02-17 13:58:32ncoghlansetnosy: + ncoghlan

messages: + msg153545
versions: - Python 3.1, Python 3.4
2012-02-17 02:18:54zullasetfiles: + urlparse.diff
keywords: + patch
messages: + msg153528
2012-02-17 02:05:52r.david.murraysetnosy: + orsenthil
messages: + msg153527
2012-02-17 01:54:33zullasetfiles: - urlparse.py
2012-02-17 01:54:09zullasetfiles: + urlparse.py

messages: + msg153525
2012-02-17 01:47:03zullasetmessages: + msg153524
2012-02-17 01:45:08r.david.murraysetnosy: + r.david.murray
messages: + msg153523
2012-02-17 01:27:49zullasetmessages: + msg153522
2012-02-16 23:40:38zullacreate