Reject newline character (U+000A) in URLs in urllib.parse #74898

vstinner · 2017-06-20T15:20:33Z

BPO	30713
Nosy	@gpshead, @vstinner, @vadmium, @postmasters, @serhiy-storchaka
PRs	bpo-30713: Reject newline in urllib.parse #2301 [Security] bpo-30713: Reject newline in urllib.parse #2303

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-07-26.02:43:37.898>
created_at = <Date 2017-06-20.15:20:33.115>
labels = ['type-security', '3.7']
title = 'Reject newline character (U+000A) in URLs in urllib.parse'
updated_at = <Date 2021-04-23.19:36:58.331>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-04-23.19:36:58.331>
actor = 'gregory.p.smith'
assignee = 'none'
closed = True
closed_date = <Date 2017-07-26.02:43:37.898>
closer = 'vstinner'
components = []
creation = <Date 2017-06-20.15:20:33.115>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 30713
keywords = []
message_count = 10.0
messages = ['296453', '296454', '296457', '296459', '296467', '296472', '297154', '297512', '298856', '299186']
nosy_count = 6.0
nosy_names = ['gregory.p.smith', 'vstinner', 'martin.panter', 'Nam.Nguyen', 'serhiy.storchaka', 'ecbftw']
pr_nums = ['2301', '2303']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue30713'
versions = ['Python 3.7']

vstinner · 2017-06-20T15:20:33Z

Spin-off of the bpo-30500: modify the urllib.parse module to reject the newline character (U+000A) in URLS. Modify 3 functions:

splittype()
splithost()
splitport()

vstinner · 2017-06-20T15:25:59Z

I chose to not change how the \r newline character (U+000D) is handled: it is still accepted, even if it is used in SMTP and HTTP as newline separator.

serhiy-storchaka · 2017-06-20T15:28:50Z

See also bpo-29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.

vstinner · 2017-06-20T15:30:21Z

See also bpo-29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.

IMHO we can/should fix ftplib (ftplib, httplib, etc.) *and* urllib.

vstinner · 2017-06-20T16:13:07Z

I tried to be more strict, and I was bitten by tests: test_urllib fails on splittype("data:...") where (...) contains newlines characters. One example:

======================================================================

ERROR: test_read_text_base64 (test.test_urllib.urlopen_DataTests)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/build/python/cpython/Lib/test/test_urllib.py", line 511, in setUp

    self.image_url_resp = urllib.request.urlopen(self.image_url)

File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 222, in urlopen

    return opener.open(url, data, timeout)

File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 510, in open

    req = Request(fullurl, data)

File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 328, in __init__

    self.full_url = url

File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 354, in full_url

    self._parse()

File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 383, in _parse

raise ValueError("unknown url type: %r" % self.full_url)

ValueError: unknown url type: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAABCAIAAAB7\nQOjdAAAAAXNSR0IArs4c6QAAAA9JREFUCNdj%0AYGBg%2BP//PwAGAQL%2BCm8 vHgAAAABJRU5ErkJggg%3D%3D%0A%20'

I modified splittype() to reject newlines before the type, but accept them after the type.

postmasters · 2017-06-20T16:48:00Z

Just being nosy here that we should not continue down the path with regex. A proper grammar parser according to spec would be much more appropriate and eliminate these whack-a-mole issues.

serhiy-storchaka · 2017-06-28T05:24:27Z

First, I think urllib.parse is not the best place for doing such checks. Even if add some checks in urllib.parse, they should be added also at lower level in urllib.request or concrete protocol implementations.

Second, PR 2303 actually doesn't reject arguments with '\n'. splithost('example.org\n') will return a tuple ('example.org\n', None), etc.

vadmium · 2017-07-02T11:44:04Z

It might help if you explained why you want to make these changes. Otherwise I have to guess. Is a compromise between strictly rejecting all non-URL characters (not just control characters), versus leaving it up to user applications to validate their URLs?

I guess it could partially prevent some newline injection problems like bpo-29606 (FTP) and bpo-30458 (HTTP). But how do we know it closes more security holes than it opens?

I don’t understand the focus on these three functions. They are undocumented and more-or-less deprecated (bpo-27485). Why not focus on the “urlsplit” and “urlparse” functions first?

Some of the changes seem to go too far, e.g. in the splithost("//hostname/u\nrl") test case, the hostname is fine, but it is not recognized. This would partially conflict the patch in bpo-13359, with proposes to percent-encode newlines after passing through “splithost”. And it would make the URL look like a relative URL, which is a potential security hole and reminds me of the open redirect bug report (bpo-23505).

ecbftw · 2017-07-22T14:46:06Z

I'm the guy that did the original security research on this issue. I've been a pentester for over 12 years, where I am regularly helping developers understand how to best correct their injection flaws. Please carefully consider what I'm trying to tell you. I've been trying to get this message through to Python for nearly 2 years now. =(

Serhiy has the right idea. His email here shows a firm understanding of the issue and the correct remediation approach:
https://mail.python.org/pipermail/python-dev/2017-July/148715.html

It seems every developer initially assumes input validation is the way you deal with injection flaws. But this creates a false sense of a "trade off" between security and usability. Rejecting evil-looking things up front just creates more problems for you. Also, you can't be sure every code path leading from user input to the injection point is addressed. (What other ways can new lines reach the FTP library?)

Instead, the problem needs to be dealt with in the lines of code just before the injection occurs. Ideally, any special character should be encoded/escaped, in which case you get the best of both worlds: security and full usability. However, when you're dealing with a syntax that was poorly designed and doesn't have a standard way to escape special characters, then you are forced to reject them. Still, it is better to do this rejection right before the commands are dropped onto the TCP stream. That way, if you later decide that there's an acceptable way to encode new lines for *specific* commands, then you can perform that encoding right there and relax the restrictions in specific cases.

I like Serhiy's idea of optionally supporting escaping via the syntax defined in RFC 2640. This should be disabled by default, and a short-term security patch shouldn't try to tackle it. But in a new Python release, a switch could be added that causes new lines to be escaped for specific commands, such as the CWD and GET commands. It is likely that this escaping is only going to be useful in specific commands, based on what FTP servers actually support. Of course none of this is possible if you use a heavy-handed approach of blocking %0a in URLs up front.

vstinner · 2017-07-26T02:43:38Z

https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs.

vstinner added 3.7 (EOL) end of life type-security A security issue labels Jun 20, 2017

vstinner closed this as completed Jul 26, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject newline character (U+000A) in URLs in urllib.parse #74898

Reject newline character (U+000A) in URLs in urllib.parse #74898

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

postmasters mannequin commented Jun 20, 2017

serhiy-storchaka commented Jun 28, 2017

vadmium commented Jul 2, 2017

ecbftw mannequin commented Jul 22, 2017

vstinner commented Jul 26, 2017

Reject newline character (U+000A) in URLs in urllib.parse #74898

Reject newline character (U+000A) in URLs in urllib.parse #74898

Comments

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

vstinner commented Jun 20, 2017

vstinner commented Jun 20, 2017

postmasters mannequin commented Jun 20, 2017

serhiy-storchaka commented Jun 28, 2017

vadmium commented Jul 2, 2017

ecbftw mannequin commented Jul 22, 2017

vstinner commented Jul 26, 2017