This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Reject newline character (U+000A) in URLs in urllib.parse
Type: security Stage: resolved
Components: Versions: Python 3.7
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Nam.Nguyen, ecbftw, gregory.p.smith, martin.panter, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-06-20 15:20 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2301 closed vstinner, 2017-06-20 15:22
PR 2303 closed vstinner, 2017-06-20 16:11
Messages (10)
msg296453 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-20 15:20
Spin-off of the bpo-30500: modify the urllib.parse module to reject the newline character (U+000A) in URLS. Modify 3 functions:

* splittype()
* splithost()
* splitport()
msg296454 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-20 15:25
I chose to not change how the \r newline character (U+000D) is handled: it is still accepted, even if it is used in SMTP and HTTP as newline separator.
msg296457 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-20 15:28
See also issue29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.
msg296459 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-20 15:30
> See also issue29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.

IMHO we can/should fix ftplib (ftplib, httplib, etc.) *and* urllib.
msg296467 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-20 16:13
I tried to be more strict, and I was bitten by tests: test_urllib fails on splittype("data:...") where (...) contains newlines characters. One example:


ERROR: test_read_text_base64 (test.test_urllib.urlopen_DataTests)


Traceback (most recent call last):

  File "/home/travis/build/python/cpython/Lib/test/", line 511, in setUp

    self.image_url_resp = urllib.request.urlopen(self.image_url)

  File "/home/travis/build/python/cpython/Lib/urllib/", line 222, in urlopen

    return, data, timeout)

  File "/home/travis/build/python/cpython/Lib/urllib/", line 510, in open

    req = Request(fullurl, data)

  File "/home/travis/build/python/cpython/Lib/urllib/", line 328, in __init__

    self.full_url = url

  File "/home/travis/build/python/cpython/Lib/urllib/", line 354, in full_url


  File "/home/travis/build/python/cpython/Lib/urllib/", line 383, in _parse

    raise ValueError("unknown url type: %r" % self.full_url)

ValueError: unknown url type: '\nQOjdAAAAAXNSR0IArs4c6QAAAA9JREFUCNdj%0AYGBg%2BP//PwAGAQL%2BCm8 vHgAAAABJRU5ErkJggg%3D%3D%0A%20'

I modified splittype() to reject newlines before the type, but accept them after the type.
msg296472 - (view) Author: Nam Nguyen (Nam.Nguyen) * Date: 2017-06-20 16:47
Just being nosy here that we should not continue down the path with regex. A proper grammar parser according to spec would be much more appropriate and eliminate these whack-a-mole issues.
msg297154 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-28 05:24
First, I think urllib.parse is not the best place for doing such checks. Even if add some checks in urllib.parse, they should be added also at lower level in urllib.request or concrete protocol implementations.

Second, PR 2303 actually doesn't reject arguments with '\n'. splithost('\n') will return a tuple ('\n', None), etc.
msg297512 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-07-02 11:44
It might help if you explained why you want to make these changes. Otherwise I have to guess. Is a compromise between strictly rejecting all non-URL characters (not just control characters), versus leaving it up to user applications to validate their URLs?

I guess it could partially prevent some newline injection problems like Issue 29606 (FTP) and Issue 30458 (HTTP). But how do we know it closes more security holes than it opens?

I don’t understand the focus on these three functions. They are undocumented and more-or-less deprecated (Issue 27485). Why not focus on the “urlsplit” and “urlparse” functions first?

Some of the changes seem to go too far, e.g. in the splithost("//hostname/u\nrl") test case, the hostname is fine, but it is not recognized. This would partially conflict the patch in Issue 13359, with proposes to percent-encode newlines after passing through “splithost”. And it would make the URL look like a relative URL, which is a potential security hole and reminds me of the open redirect bug report (Issue 23505).
msg298856 - (view) Author: (ecbftw) Date: 2017-07-22 14:46
I'm the guy that did the original security research on this issue.  I've been a pentester for over 12 years, where I am regularly helping developers understand how to best correct their injection flaws. Please carefully consider what I'm trying to tell you. I've been trying to get this message through to Python for nearly 2 years now. =(

Serhiy has the right idea.  His email here shows a firm understanding of the issue and the correct remediation approach:

It seems every developer initially assumes input validation is the way you deal with injection flaws. But this creates a false sense of a "trade off" between security and usability. Rejecting evil-looking things up front just creates more problems for you. Also, you can't be sure every code path leading from user input to the injection point is addressed. (What other ways can new lines reach the FTP library?)

Instead, the problem needs to be dealt with in the lines of code just before the injection occurs. Ideally, any special character should be encoded/escaped, in which case you get the best of both worlds: security and full usability. However, when you're dealing with a syntax that was poorly designed and doesn't have a standard way to escape special characters, then you are forced to reject them. Still, it is better to do this rejection right before the commands are dropped onto the TCP stream. That way, if you later decide that there's an acceptable way to encode new lines for *specific* commands, then you can perform that encoding right there and relax the restrictions in specific cases.

I like Serhiy's idea of optionally supporting escaping via the syntax defined in RFC 2640. This should be disabled by default, and a short-term security patch shouldn't try to tackle it. But in a new Python release, a switch could be added that causes new lines to be escaped for specific commands, such as the CWD and GET commands.  It is likely that this escaping is only going to be useful in specific commands, based on what FTP servers actually support. Of course none of this is possible if you use a heavy-handed approach of blocking %0a in URLs up front.
msg299186 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-26 02:43 was fixed in ftplib. urllib is not the right place to reject invalid inputs.
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74898
2021-04-23 19:36:58gregory.p.smithsetnosy: + gregory.p.smith
2017-07-26 02:43:37vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg299186

stage: resolved
2017-07-22 14:46:06ecbftwsetnosy: + ecbftw
messages: + msg298856
2017-07-02 11:44:04martin.pantersetnosy: + martin.panter
messages: + msg297512
2017-06-28 05:24:27serhiy.storchakasetmessages: + msg297154
2017-06-20 16:47:59Nam.Nguyensetnosy: + Nam.Nguyen
messages: + msg296472
2017-06-20 16:13:07vstinnersetmessages: + msg296467
2017-06-20 16:11:26vstinnersetpull_requests: + pull_request2350
2017-06-20 15:30:21vstinnersetmessages: + msg296459
2017-06-20 15:28:49serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg296457
2017-06-20 15:25:58vstinnersetmessages: + msg296454
2017-06-20 15:22:56vstinnersetpull_requests: + pull_request2348
2017-06-20 15:20:33vstinnercreate