Issue 36274: http.client cannot send non-ASCII request lines

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80455

classification

Title:	http.client cannot send non-ASCII request lines
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, eric.snow, jaraco, larry, mcepl, miss-islington, ned.deily, orsenthil, tburke, webknjaz, xtreak
Priority:	normal	Keywords:	patch

Created on 2019-03-12 20:33 by tburke, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 12314	closed	tburke, 2019-03-13 23:59
PR 12315	closed	tburke, 2019-03-13 23:59
PR 16321	closed	jaraco, 2019-09-21 08:55
PR 16448	merged	jaraco, 2019-09-28 01:36
PR 16460	merged	miss-islington, 2019-09-28 12:32
PR 16461	merged	jaraco, 2019-09-28 12:46
PR 16462	merged	jaraco, 2019-09-28 12:48
PR 16475	merged	jaraco, 2019-09-29 13:22
PR 16476	merged	jaraco, 2019-09-29 13:39

Messages (19)
msg337802 - (view)	Author: Tim Burke (tburke) *	Date: 2019-03-12 20:33
While the RFCs are rather clear that non-ASCII data would be out of spec, * that doesn't prevent a poorly-behaved client from sending non-ASCII bytes on the wire, which means * as an application developer, it's useful to be able to mimic such a client to verify expected behavior while still using stdlib to handle things like header parsing, particularly since * this worked perfectly well on Python 2. The two most-obvious ways (to me, anyway) to try to send a request for /你好 (for example) are # Assume it will get UTF-8 encoded, as that's the default encoding # for urllib.parse.quote() conn.putrequest('GET', '/\u4f60\u597d') # Assume it will get Latin-1 encoded, as # * that's the encoding used in http.client.parse_headers(), # * that's the encoding used for PEP-3333, and # * it has a one-to-one mapping with bytes conn.putrequest('GET', '/\xe4\xbd\xa0\xe5\xa5\xbd') both fail with something like UnicodeEncodeError: 'ascii' codec can't encode characters in position ... Trying to pre-encode like conn.putrequest('GET', b'/\xe4\xbd\xa0\xe5\xa5\xbd') at least doesn't raise an error, but still does not do what was intended; rather than a request line like GET /你好 HTTP/1.1 (or /ä½ å¥½ depending on how you choose to interpret the bytes), the server gets GET b'/\xe4\xbd\xa0\xe5\xa5\xbd' HTTP/1.1 The trouble comes down to https://github.com/python/cpython/blob/v3.7.2/Lib/http/client.py#L1104-L1107 -- we don't actually have any control over what the caller passes as the url (so the assumption doesn't hold), nor do we know anything about the encoding that was intended. One of three fixes seems warranted: * Switch to using Latin-1 to encode instead of ASCII (again, leaning on the precedent set in parse_headers and PEP-3333). This may make it too easy to write an out-of-spec client, however. * Continue to use ASCII to encode, but include errors='surrogateescape' to give callers an escape hatch. This seems like a reasonably high bar to ensure that the caller actually intends to send unquoted data. * Accept raw bytes and actually use them (rather than their repr()), allowing the caller to decide upon an appropriate encoding.
msg351834 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-11 11:50
Thank you Tim for the reasoned issue and proposed solutions. After reviewing these proposals with @eric.snow, we've decided that this approach is dangerous in that the proposed approaches has the potential to expose users unexpectedly to non-compliant behavior, where as currently they are assured compliance. Instead, we would like to see a more explicit opt-in, such as through a separate method or through a setting on the call and/or client object. Consider instead a solution that implements both `.putrequest` and `.putrequest_raw` or `.putrequest_allow_invalid_bytes` that sends a clear signal to the user that they're bypassing the default protections. Or consider another approach where HTTPConnection implements an `_encode_request()` method that a subclass with a specialized need could override. Would either of those approaches suit your use-case?
msg352023 - (view)	Author: Tim Burke (tburke) *	Date: 2019-09-11 20:55
Fair enough. Seems kinda related to https://bugs.python.org/issue30458 -- looks like it was a fun one ;-) I think either approach would work for me; my existing work-around doesn't preclude either, particularly since I want it purely for testing purposes. For a bit of context, I work on a large-ish project (few hundred kloc if you include tests) that recently finished porting from python 2.7 to 3.6 & 3.7. As part of that process I discovered https://bugs.python.org/issue33973 and worked around it in https://github.com/openstack/swift/commit/93b49c5. That was a prerequisite for running tests under py2 against a server running py3, but this bug complicated us running the tests under py3 as well. I eventually landed on https://github.com/openstack/swift/commit/c0ae48b as a work-around. I could probably take another stab at a fix if you like, though I'm not entirely sure when I'll get to it at the moment.
msg352340 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-13 14:26
I believe this issue is more recent and widespread than I originally thought. I realized today that CherryPy is affected by this issue and [disabled nightly tests](https://github.com/cherrypy/cherrypy/issues/1781#issuecomment-507836873) as a result. It looks like the changed behavior was introduced in Python 3.7.4, so I expect stable releases to start failing also now. In that case, the web framework wishes to test that a null byte transmitted by the client is handled properly in the server, and without a patch, it's not possible for a project like cherrypy to transmit the invalid request to ensure a proper invalid response. I see now that the previously proposed solution wouldn't even address this use-case, as the null byte would still have been excluded.
msg352342 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-13 14:56
I've started drafting a patch at https://github.com/python/cpython/tree/feature/putrequest-hooks
msg352452 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-14 20:43
As I considered a patch for this, I realized there are actually two issues, the one in the title "http.client cannot send non-ASCII request lines" but also "the protection for invalid requests prevents usage to generate invalid requests". The former issue was new with Python 3 while the latter was introduced with issue30458. I can't decide if we should handle these two issues separately, or address them together in this ticket.
msg352597 - (view)	Author: Ned Deily (ned.deily) *	Date: 2019-09-17 05:01
See msg352596 in Issue30458 for discussion of whether the regression should be considered a "release blocker" for the imminent 3.7.5 and 3.5.8 releases.
msg352711 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2019-09-18 08:42
Is there a reason why cherrypy doesn't URL encode the null byte in this test as per comment : https://github.com/cherrypy/cherrypy/issues/1781#issuecomment-504476658 ? The original fix was adopted from golang (https://github.com/golang/go/commit/829c5df58694b3345cb5ea41206783c8ccf5c3ca) and as per the cherrypy test the golang client also should be forbidding invalid control character : https://play.golang.org/p/FTNtRmvlWrn Also as per the discussion one of our own stdlib tests were depending on this behavior in the http client and had to be changed : https://github.com/python/cpython/pull/12755#issuecomment-481617878 The change made in Issue30458 also affects urllib3 test in 3.7.4 causing CI failure and the related discussion to encode URLs : https://github.com/urllib3/urllib3/pull/1671#discussion_r318804155 . urllib3 issue : https://github.com/urllib3/urllib3/issues/1664 . IIRC urllib3 was also patched for this vulnerability in similar manner as part of larger refactor : https://github.com/urllib3/urllib3/issues/1553 . I didn't verify yet, it's unreleased and how urllib3 client behaves before and after patch for the CRLF characters and similar tests that use urllib3 as client. Applying the URL encoding patch to cherrypy I can verify that the behavior is also tested diff --git a/cherrypy/test/test_static.py b/cherrypy/test/test_static.py index cdd821ae..85a51cf3 100644 --- a/cherrypy/test/test_static.py +++ b/cherrypy/test/test_static.py @@ -398,7 +398,8 @@ class StaticTest(helper.CPWebCase): self.assertInBody("I couldn't find that thing") def test_null_bytes(self): - self.getPage('/static/\x00') + from urllib.parse import quote + self.getPage(quote('/static/\x00')) self.assertStatus('404 Not Found') @classmethod After patch the test passes and ValueError is also raised properly as the null byte is decoded in the server and using it in os.stat to resolve null byte path. Breakages were expected in the discussion as adopting the fix from golang. Giving an option like a parameter to bypass the validation was also discussed in the linked but giving an option would also mean it could be abused or missed.
msg352725 - (view)	Author: Sviatoslav Sydorenko (webknjaz) *	Date: 2019-09-18 13:05
@xtreak the encoded null-byte test would be an extra test case to consider. It is reasonable to test as many known invalid sequences as possible. Changing that byte to encoded notation would just replace one test with another effectively changing the semantics of it. To me, it's quite weird that it's considered a CVE at all: it's happening on the client side and it doesn't prevent the user from just feeding the proper bytes right into the socket so why overcomplicate things?
msg352739 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-18 16:15
I've created issue38216 to address the issue of sending invalid bytes in the request line, separate from the original intention and title issue about sending non-ASCII.
msg352743 - (view)	Author: Karthikeyan Singaravelan (xtreak) *	Date: 2019-09-18 16:40
> @xtreak the encoded null-byte test would be an extra test case to consider. It is reasonable to test as many known invalid sequences as possible. Changing that byte to encoded notation would just replace one test with another effectively changing the semantics of it. Agreed that it's slightly different but I also admit I was also relying on the fact that encoded request will also be decoded by the server as done in most web frameworks to act upon the decoded payload. There could be systems where raw null byte might need to be transferred where the receiving system might not decode the payload but it seemed to be a workaround to ensure the required exception of serving null byte included static file was handled properly by CherryPy and also seemed to be adopted by others like urllib3. > To me, it's quite weird that it's considered a CVE at all: it's happening on the client side and it doesn't prevent the user from just feeding the proper bytes right into the socket so why overcomplicate things? The impacted function in http is used by functions like urllib.urlopen which is often feeded with input from the user that is not validated properly with some expectation that the function itself might handle the URL parsing and is safe enough. In this case it could lead to things like SSRF where malicious input could end up executing arbitrary code in some of the systems like Redis as demonstrated in the report. As for the CVE in question it was originally reported to golang and the same attack was also found out to be vulnerable. The fix adopted also seemed to have address few other security reports of similar nature. Thanks Jason, as issue38216 is now a separate issue we can discuss around there. I guess this issue then is not a 3.7 only regression anymore since the original issue reported predates 3.7.
msg352750 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-18 18:43
That's right - no longer a 3.7 regression here.
msg352980 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-22 15:54
I should say, though, this issue is a long-standing regression from Python 3.0. Although intentional, the inability for a client to override the encoding of the request line does make it impossible without replacing all of .putrequest to override that behavior, so although it would be a feature, it might justify a backport to supported bugfix Pythons to facilitate migration from Python 2.
msg353448 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-28 12:32
New changeset 7774d7831e8809795c64ce27f7df52674581d298 by Jason R. Coombs in branch 'master': bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) https://github.com/python/cpython/commit/7774d7831e8809795c64ce27f7df52674581d298
msg353450 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-09-28 13:15
New changeset 80dd66ac278ecbabbf843526e3a56f5031da9562 by Jason R. Coombs in branch '3.7': [3.7] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (GH-16461) https://github.com/python/cpython/commit/80dd66ac278ecbabbf843526e3a56f5031da9562
msg353454 - (view)	Author: miss-islington (miss-islington)	Date: 2019-09-28 14:23
New changeset 8f478b489ae11633d2609dff0ef21d0e1a857417 by Miss Islington (bot) in branch '3.8': bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) https://github.com/python/cpython/commit/8f478b489ae11633d2609dff0ef21d0e1a857417
msg353465 - (view)	Author: Ned Deily (ned.deily) *	Date: 2019-09-28 16:44
New changeset 5b18ce60b432d1dfa6f6988be07dd55646201a9b by Ned Deily (Jason R. Coombs) in branch '3.6': [3.6] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (GH-16462) https://github.com/python/cpython/commit/5b18ce60b432d1dfa6f6988be07dd55646201a9b
msg354160 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2019-10-08 02:00
New changeset f5b1abbb3b0083381925dcd5898ae6d019224826 by Benjamin Peterson (Jason R. Coombs) in branch '2.7': [2.7] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16476) https://github.com/python/cpython/commit/f5b1abbb3b0083381925dcd5898ae6d019224826
msg354226 - (view)	Author: Larry Hastings (larry) *	Date: 2019-10-08 17:36
New changeset 2784e78dc3445c6dd59e915d86c336374c1fa09a by larryhastings (Jason R. Coombs) in branch '3.5': [3.5] bpo-38216, bpo-36274: Allow subclasses to separately override validation and encoding behavior (GH-16448) (#16475) https://github.com/python/cpython/commit/2784e78dc3445c6dd59e915d86c336374c1fa09a

History
Date	User	Action	Args
2022-04-11 14:59:12	admin	set	github: 80455
2019-10-12 12:10:23	larry	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-10-08 17:36:57	larry	set	nosy: + larry messages: + msg354226
2019-10-08 02:00:05	benjamin.peterson	set	nosy: + benjamin.peterson messages: + msg354160
2019-09-29 13:39:49	jaraco	set	pull_requests: + pull_request16062
2019-09-29 13:22:40	jaraco	set	pull_requests: + pull_request16060
2019-09-28 16:44:19	ned.deily	set	messages: + msg353465
2019-09-28 14:23:38	miss-islington	set	nosy: + miss-islington messages: + msg353454
2019-09-28 13:15:09	jaraco	set	messages: + msg353450
2019-09-28 12:48:28	jaraco	set	pull_requests: + pull_request16047
2019-09-28 12:46:45	jaraco	set	pull_requests: + pull_request16045
2019-09-28 12:32:16	miss-islington	set	pull_requests: + pull_request16043
2019-09-28 12:32:04	jaraco	set	messages: + msg353448
2019-09-28 01:36:08	jaraco	set	pull_requests: + pull_request16028
2019-09-22 15:54:53	jaraco	set	messages: + msg352980
2019-09-21 08:55:08	jaraco	set	pull_requests: + pull_request15899
2019-09-18 18:43:20	jaraco	set	keywords: - 3.7regression messages: + msg352750
2019-09-18 16:40:38	xtreak	set	messages: + msg352743
2019-09-18 16:15:39	jaraco	set	messages: + msg352739
2019-09-18 13:05:58	webknjaz	set	messages: + msg352725
2019-09-18 08:42:16	xtreak	set	messages: + msg352711
2019-09-17 05:01:27	ned.deily	set	keywords: + 3.7regression nosy: + ned.deily messages: + msg352597
2019-09-15 10:53:15	xtreak	set	nosy: + xtreak
2019-09-14 21:23:32	webknjaz	set	nosy: + webknjaz
2019-09-14 20:43:36	jaraco	set	messages: + msg352452
2019-09-13 21:37:27	mcepl	set	nosy: + mcepl
2019-09-13 14:56:40	jaraco	set	messages: + msg352342
2019-09-13 14:26:06	jaraco	set	messages: + msg352340
2019-09-11 20:55:44	tburke	set	messages: + msg352023
2019-09-11 11:50:34	jaraco	set	nosy: + eric.snow, jaraco messages: + msg351834
2019-03-13 23:59:52	tburke	set	pull_requests: + pull_request12289
2019-03-13 23:59:19	tburke	set	keywords: + patch stage: patch review pull_requests: + pull_request12288
2019-03-13 10:18:31	SilentGhost	set	nosy: + orsenthil versions: - Python 3.4, Python 3.5, Python 3.6
2019-03-12 20:33:23	tburke	create