Issue 30713: Reject newline character (U+000A) in URLs in urllib.parse

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74898

classification

Title:	Reject newline character (U+000A) in URLs in urllib.parse
Type:	security	Stage:	resolved
Components:		Versions:	Python 3.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	Nam.Nguyen, ecbftw, gregory.p.smith, martin.panter, serhiy.storchaka, vstinner
Priority:	normal	Keywords:

Created on 2017-06-20 15:20 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 2301	closed	vstinner, 2017-06-20 15:22
PR 2303	closed	vstinner, 2017-06-20 16:11

Messages (10)
msg296453 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-20 15:20
Spin-off of the bpo-30500: modify the urllib.parse module to reject the newline character (U+000A) in URLS. Modify 3 functions: * splittype() * splithost() * splitport()
msg296454 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-20 15:25
I chose to not change how the \r newline character (U+000D) is handled: it is still accepted, even if it is used in SMTP and HTTP as newline separator.
msg296457 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-06-20 15:28
See also issue29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.
msg296459 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-20 15:30
> See also issue29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility. IMHO we can/should fix ftplib (ftplib, httplib, etc.) and urllib.
msg296467 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-20 16:13
I tried to be more strict, and I was bitten by tests: test_urllib fails on splittype("data:...") where (...) contains newlines characters. One example: ====================================================================== ERROR: test_read_text_base64 (test.test_urllib.urlopen_DataTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/build/python/cpython/Lib/test/test_urllib.py", line 511, in setUp self.image_url_resp = urllib.request.urlopen(self.image_url) File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 510, in open req = Request(fullurl, data) File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 328, in __init__ self.full_url = url File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 354, in full_url self._parse() File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 383, in _parse raise ValueError("unknown url type: %r" % self.full_url) ValueError: unknown url type: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAABCAIAAAB7\nQOjdAAAAAXNSR0IArs4c6QAAAA9JREFUCNdj%0AYGBg%2BP//PwAGAQL%2BCm8 vHgAAAABJRU5ErkJggg%3D%3D%0A%20' I modified splittype() to reject newlines before the type, but accept them after the type.
msg296472 - (view)	Author: Nam Nguyen (Nam.Nguyen) *	Date: 2017-06-20 16:47
Just being nosy here that we should not continue down the path with regex. A proper grammar parser according to spec would be much more appropriate and eliminate these whack-a-mole issues.
msg297154 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-06-28 05:24
First, I think urllib.parse is not the best place for doing such checks. Even if add some checks in urllib.parse, they should be added also at lower level in urllib.request or concrete protocol implementations. Second, PR 2303 actually doesn't reject arguments with '\n'. splithost('example.org\n') will return a tuple ('example.org\n', None), etc.
msg297512 - (view)	Author: Martin Panter (martin.panter) *	Date: 2017-07-02 11:44
It might help if you explained why you want to make these changes. Otherwise I have to guess. Is a compromise between strictly rejecting all non-URL characters (not just control characters), versus leaving it up to user applications to validate their URLs? I guess it could partially prevent some newline injection problems like Issue 29606 (FTP) and Issue 30458 (HTTP). But how do we know it closes more security holes than it opens? I don’t understand the focus on these three functions. They are undocumented and more-or-less deprecated (Issue 27485). Why not focus on the “urlsplit” and “urlparse” functions first? Some of the changes seem to go too far, e.g. in the splithost("//hostname/u\nrl") test case, the hostname is fine, but it is not recognized. This would partially conflict the patch in Issue 13359, with proposes to percent-encode newlines after passing through “splithost”. And it would make the URL look like a relative URL, which is a potential security hole and reminds me of the open redirect bug report (Issue 23505).
msg298856 - (view)	Author: (ecbftw)	Date: 2017-07-22 14:46
I'm the guy that did the original security research on this issue. I've been a pentester for over 12 years, where I am regularly helping developers understand how to best correct their injection flaws. Please carefully consider what I'm trying to tell you. I've been trying to get this message through to Python for nearly 2 years now. =( Serhiy has the right idea. His email here shows a firm understanding of the issue and the correct remediation approach: https://mail.python.org/pipermail/python-dev/2017-July/148715.html It seems every developer initially assumes input validation is the way you deal with injection flaws. But this creates a false sense of a "trade off" between security and usability. Rejecting evil-looking things up front just creates more problems for you. Also, you can't be sure every code path leading from user input to the injection point is addressed. (What other ways can new lines reach the FTP library?) Instead, the problem needs to be dealt with in the lines of code just before the injection occurs. Ideally, any special character should be encoded/escaped, in which case you get the best of both worlds: security and full usability. However, when you're dealing with a syntax that was poorly designed and doesn't have a standard way to escape special characters, then you are forced to reject them. Still, it is better to do this rejection right before the commands are dropped onto the TCP stream. That way, if you later decide that there's an acceptable way to encode new lines for specific commands, then you can perform that encoding right there and relax the restrictions in specific cases. I like Serhiy's idea of optionally supporting escaping via the syntax defined in RFC 2640. This should be disabled by default, and a short-term security patch shouldn't try to tackle it. But in a new Python release, a switch could be added that causes new lines to be escaped for specific commands, such as the CWD and GET commands. It is likely that this escaping is only going to be useful in specific commands, based on what FTP servers actually support. Of course none of this is possible if you use a heavy-handed approach of blocking %0a in URLs up front.
msg299186 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-26 02:43
https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs.

History
Date	User	Action	Args
2022-04-11 14:58:47	admin	set	github: 74898
2021-04-23 19:36:58	gregory.p.smith	set	nosy: + gregory.p.smith
2017-07-26 02:43:37	vstinner	set	status: open -> closed resolution: rejected messages: + msg299186 stage: resolved
2017-07-22 14:46:06	ecbftw	set	nosy: + ecbftw messages: + msg298856
2017-07-02 11:44:04	martin.panter	set	nosy: + martin.panter messages: + msg297512
2017-06-28 05:24:27	serhiy.storchaka	set	messages: + msg297154
2017-06-20 16:47:59	Nam.Nguyen	set	nosy: + Nam.Nguyen messages: + msg296472
2017-06-20 16:13:07	vstinner	set	messages: + msg296467
2017-06-20 16:11:26	vstinner	set	pull_requests: + pull_request2350
2017-06-20 15:30:21	vstinner	set	messages: + msg296459
2017-06-20 15:28:49	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg296457
2017-06-20 15:25:58	vstinner	set	messages: + msg296454
2017-06-20 15:22:56	vstinner	set	pull_requests: + pull_request2348
2017-06-20 15:20:33	vstinner	create