# HG changeset patch # Parent 48f8148428d7b745983378b22125c99c2a5e9f14 Issue #12319: Always send file request bodies using chunked encoding The previous attempt to determine the file’s Content-Length gave a false positive for pipes on Windows. Also, drop the special case for sending zero-length iterable bodies. diff -r 48f8148428d7 Doc/library/http.client.rst --- a/Doc/library/http.client.rst Thu Aug 25 01:22:44 2016 -0400 +++ b/Doc/library/http.client.rst Thu Aug 25 05:28:26 2016 +0000 @@ -240,17 +240,17 @@ The *headers* argument should be a mapping of extra HTTP headers to send with the request. - If *headers* contains neither Content-Length nor Transfer-Encoding, a - Content-Length header will be added automatically if possible. If + If *headers* contains neither Content-Length nor Transfer-Encoding, + but there is a request body, one of those + header fields will be added automatically. If *body* is ``None``, the Content-Length header is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If - *body* is a string or bytes-like object, the Content-Length header is - set to its length. If *body* is a binary :term:`file object` - supporting :meth:`~io.IOBase.seek`, this will be used to determine - its size. Otherwise, the Content-Length header is not added - automatically. In cases where determining the Content-Length up - front is not possible, the body will be chunk-encoded and the - Transfer-Encoding header will automatically be set. + *body* is a string or a bytes-like object that is not also a + :term:`file `, the Content-Length header is + set to its length. Any other type of *body* (files + and iterables in general) will be chunk-encoded and the + Transfer-Encoding header will automatically be set instead of + Content-Length. The *encode_chunked* argument is only relevant if Transfer-Encoding is specified in *headers*. If *encode_chunked* is ``False``, the @@ -260,19 +260,18 @@ .. note:: Chunked transfer encoding has been added to the HTTP protocol version 1.1. Unless the HTTP server is known to handle HTTP 1.1, - the caller must either specify the Content-Length or must use a - body representation whose length can be determined automatically. + the caller must either specify the Content-Length, or must pass a + :class:`str` or bytes-like object that is not also a file as the + body representation. .. versionadded:: 3.2 *body* can now be an iterable. .. versionchanged:: 3.6 If neither Content-Length nor Transfer-Encoding are set in - *headers* and Content-Length cannot be determined, *body* will now - be automatically chunk-encoded. The *encode_chunked* argument - was added. - The Content-Length for binary file objects is determined with seek. - No attempt is made to determine the Content-Length for text file + *headers*, file and iterable *body* objects are now chunk-encoded. + The *encode_chunked* argument was added. + No attempt is made to determine the Content-Length for file objects. .. method:: HTTPConnection.getresponse() diff -r 48f8148428d7 Doc/library/urllib.request.rst --- a/Doc/library/urllib.request.rst Thu Aug 25 01:22:44 2016 -0400 +++ b/Doc/library/urllib.request.rst Thu Aug 25 05:28:26 2016 +0000 @@ -187,12 +187,11 @@ server, or ``None`` if no such data is needed. Currently HTTP requests are the only ones that use *data*. The supported object types include bytes, file-like objects, and iterables. If no - ``Content-Length`` header has been provided, :class:`HTTPHandler` will - try to determine the length of *data* and set this header accordingly. - If this fails, ``Transfer-Encoding: chunked`` as specified in - :rfc:`7230`, Section 3.3.1 will be used to send the data. See - :meth:`http.client.HTTPConnection.request` for details on the - supported object types and on how the content length is determined. + ``Content-Length`` nor ``Transfer-Encoding`` header field + has been provided, :class:`HTTPHandler` will set these headers according + to the type of *data*. ``Content-Length`` will be used to send + bytes objects, while ``Transfer-Encoding: chunked`` as specified in + :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables. For an HTTP POST request method, *data* should be a buffer in the standard :mimetype:`application/x-www-form-urlencoded` format. The @@ -256,8 +255,8 @@ .. versionchanged:: 3.6 Do not raise an error if the ``Content-Length`` has not been - provided and could not be determined. Fall back to use chunked - transfer encoding instead. + provided and *data* is neither ``None`` nor a bytes object. + Fall back to use chunked transfer encoding instead. .. class:: OpenerDirector() diff -r 48f8148428d7 Doc/whatsnew/3.6.rst --- a/Doc/whatsnew/3.6.rst Thu Aug 25 01:22:44 2016 -0400 +++ b/Doc/whatsnew/3.6.rst Thu Aug 25 05:28:26 2016 +0000 @@ -512,8 +512,8 @@ urllib.request -------------- -If a HTTP request has a non-empty body but no Content-Length header -and the content length cannot be determined up front, rather than +If a HTTP request has a file or iterable body (other than a +bytes object) but no Content-Length header, rather than throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now falls back to use chunked transfer encoding. (Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.) @@ -868,6 +868,13 @@ This behavior has also been backported to earlier Python versions by Setuptools 26.0.0. +* In the :mod:`urllib.request` module and the + :meth:`http.client.HTTPConnection.request` method, if no Content-Length + header field has been specified and the request body is a file object, + it is now sent with HTTP 1.1 chunked encoding. If a file object has to + be sent to a HTTP 1.0 server, the Content-Length value now has to be + specified by the caller. See :issue:`12319`. + Changes in the C API -------------------- diff -r 48f8148428d7 Lib/http/client.py --- a/Lib/http/client.py Thu Aug 25 01:22:44 2016 -0400 +++ b/Lib/http/client.py Thu Aug 25 05:28:26 2016 +0000 @@ -805,35 +805,21 @@ def _get_content_length(body, method): """Get the content-length based on the body. - If the body is "empty", we set Content-Length: 0 for methods - that expect a body (RFC 7230, Section 3.3.2). If the body is - set for other methods, we set the header provided we can - figure out what the length is. + If the body is None, we set Content-Length: 0 for methods that expect + a body (RFC 7230, Section 3.3.2). We also set the Content-Length for + any method if the body is a str or bytes-like object and not a file. """ - if not body: + if body is None: # do an explicit check for not None here to distinguish # between unset and set but empty - if method.upper() in _METHODS_EXPECTING_BODY or body is not None: + if method.upper() in _METHODS_EXPECTING_BODY: return 0 else: return None if hasattr(body, 'read'): # file-like object. - if HTTPConnection._is_textIO(body): - # text streams are unpredictable because it depends on - # character encoding and line ending translation. - return None - else: - # Is it seekable? - try: - curpos = body.tell() - sz = body.seek(0, io.SEEK_END) - except (TypeError, AttributeError, OSError): - return None - else: - body.seek(curpos) - return sz - curpos + return None try: # does it implement the buffer protocol (bytes, bytearray, array)? @@ -1266,8 +1252,7 @@ # the caller passes encode_chunked=True or the following # conditions hold: # 1. content-length has not been explicitly set - # 2. the length of the body cannot be determined - # (e.g. it is a generator or unseekable file) + # 2. the body is a file or iterable, but not a str or bytes-like # 3. Transfer-Encoding has NOT been explicitly set by the caller if 'content-length' not in header_names: @@ -1280,7 +1265,7 @@ encode_chunked = False content_length = self._get_content_length(body, method) if content_length is None: - if body: + if body is not None: if self.debuglevel > 0: print('Unable to determine size of %r' % body) encode_chunked = True diff -r 48f8148428d7 Lib/test/test_httplib.py --- a/Lib/test/test_httplib.py Thu Aug 25 01:22:44 2016 -0400 +++ b/Lib/test/test_httplib.py Thu Aug 25 05:28:26 2016 +0000 @@ -381,6 +381,16 @@ # same request self.assertNotIn('content-length', [k.lower() for k in headers]) + def test_empty_body(self): + # Zero-length iterable should be treated like any other iterable + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(b'') + conn.request('POST', '/', ()) + _, headers, body = self._parse_request(conn.sock.data) + self.assertEqual(headers['Transfer-Encoding'], 'chunked') + self.assertNotIn('content-length', [k.lower() for k in headers]) + self.assertEqual(body, b"0\r\n\r\n") + def _make_body(self, empty_lines=False): lines = self.expected_body.split(b' ') for idx, line in enumerate(lines): @@ -652,7 +662,9 @@ def test_send_file(self): expected = (b'GET /foo HTTP/1.1\r\nHost: example.com\r\n' - b'Accept-Encoding: identity\r\nContent-Length:') + b'Accept-Encoding: identity\r\n' + b'Transfer-Encoding: chunked\r\n' + b'\r\n') with open(__file__, 'rb') as body: conn = client.HTTPConnection('example.com') @@ -1717,7 +1729,7 @@ self.assertEqual("5", message.get("content-length")) self.assertEqual(b'body\xc1', f.read()) - def test_file_body(self): + def test_text_file_body(self): self.addCleanup(support.unlink, support.TESTFN) with open(support.TESTFN, "w") as f: f.write("body") @@ -1726,10 +1738,8 @@ message, f = self.get_headers_and_fp() self.assertEqual("text/plain", message.get_content_type()) self.assertIsNone(message.get_charset()) - # Note that the length of text files is unpredictable - # because it depends on character encoding and line ending - # translation. No content-length will be set, the body - # will be sent using chunked transfer encoding. + # No content-length will be determined for files; the body + # will be sent using chunked transfer encoding instead. self.assertIsNone(message.get("content-length")) self.assertEqual("chunked", message.get("transfer-encoding")) self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read()) @@ -1743,8 +1753,9 @@ message, f = self.get_headers_and_fp() self.assertEqual("text/plain", message.get_content_type()) self.assertIsNone(message.get_charset()) - self.assertEqual("5", message.get("content-length")) - self.assertEqual(b'body\xc1', f.read()) + self.assertEqual("chunked", message.get("Transfer-Encoding")) + self.assertNotIn("Content-Length", message) + self.assertEqual(b'5\r\nbody\xc1\r\n0\r\n\r\n', f.read()) class HTTPResponseTest(TestCase): diff -r 48f8148428d7 Lib/test/test_urllib2.py --- a/Lib/test/test_urllib2.py Thu Aug 25 01:22:44 2016 -0400 +++ b/Lib/test/test_urllib2.py Thu Aug 25 05:28:26 2016 +0000 @@ -913,40 +913,50 @@ self.assertEqual(req.unredirected_hdrs["Spam"], "foo") def test_http_body_file(self): - # A regular file - Content Length is calculated unless already set. + # A regular file - chunked encoding is used unless Content Length is + # already set. h = urllib.request.AbstractHTTPHandler() o = h.parent = MockOpener() file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False) file_path = file_obj.name - file_obj.write(b"Something\nSomething\nSomething\n") file_obj.close() + self.addCleanup(os.unlink, file_path) - for headers in {}, {"Content-Length": 30}: - with open(file_path, "rb") as f: - req = Request("http://example.com/", f, headers) - newreq = h.do_request_(req) - self.assertEqual(int(newreq.get_header('Content-length')), 30) + with open(file_path, "rb") as f: + req = Request("http://example.com/", f, {}) + newreq = h.do_request_(req) + te = newreq.get_header('Transfer-encoding') + self.assertEqual(te, "chunked") + self.assertFalse(newreq.has_header('Content-length')) - os.unlink(file_path) + with open(file_path, "rb") as f: + req = Request("http://example.com/", f, {"Content-Length": 30}) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + self.assertFalse(newreq.has_header("Transfer-encoding")) def test_http_body_fileobj(self): - # A file object - Content Length is calculated unless already set. + # A file object - chunked encoding is used + # unless Content Length is already set. # (Note that there are some subtle differences to a regular # file, that is why we are testing both cases.) h = urllib.request.AbstractHTTPHandler() o = h.parent = MockOpener() + file_obj = io.BytesIO() - file_obj = io.BytesIO() - file_obj.write(b"Something\nSomething\nSomething\n") + req = Request("http://example.com/", file_obj, {}) + newreq = h.do_request_(req) + self.assertEqual(newreq.get_header('Transfer-encoding'), 'chunked') + self.assertFalse(newreq.has_header('Content-length')) - for headers in {}, {"Content-Length": 30}: - file_obj.seek(0) - req = Request("http://example.com/", file_obj, headers) - newreq = h.do_request_(req) - self.assertEqual(int(newreq.get_header('Content-length')), 30) + headers = {"Content-Length": 30} + req = Request("http://example.com/", file_obj, headers) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + self.assertFalse(newreq.has_header("Transfer-encoding")) file_obj.close() @@ -959,9 +969,7 @@ h = urllib.request.AbstractHTTPHandler() o = h.parent = MockOpener() - cmd = [sys.executable, "-c", - r"import sys; " - r"sys.stdout.buffer.write(b'Something\nSomething\nSomething\n')"] + cmd = [sys.executable, "-c", r"pass"] for headers in {}, {"Content-Length": 30}: with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc: req = Request("http://example.com/", proc.stdout, headers) @@ -983,8 +991,6 @@ def iterable_body(): yield b"one" - yield b"two" - yield b"three" for headers in {}, {"Content-Length": 11}: req = Request("http://example.com/", iterable_body(), headers) @@ -996,6 +1002,14 @@ else: self.assertEqual(int(newreq.get_header('Content-length')), 11) + def test_http_body_empty_seq(self): + # Zero-length iterable body should be treated like any other iterable + h = urllib.request.AbstractHTTPHandler() + h.parent = MockOpener() + req = h.do_request_(Request("http://example.com/", ())) + self.assertEqual(req.get_header("Transfer-encoding"), "chunked") + self.assertFalse(req.has_header("Content-length")) + def test_http_body_array(self): # array.array Iterable - Content Length is calculated diff -r 48f8148428d7 Misc/NEWS --- a/Misc/NEWS Thu Aug 25 01:22:44 2016 -0400 +++ b/Misc/NEWS Thu Aug 25 05:28:26 2016 +0000 @@ -49,10 +49,9 @@ - Issue #12319: Chunked transfer encoding support added to http.client.HTTPConnection requests. The urllib.request.AbstractHTTPHandler class does not enforce a Content-Length - header any more. If a HTTP request has a non-empty body, but no - Content-Length header, and the content length cannot be determined - up front, rather than throwing an error, the library now falls back - to use chunked transfer encoding. + header any more. If a HTTP request has a file or iterable body, but no + Content-Length header, the library now falls back to use chunked transfer- + encoding. - A new version of typing.py from https://github.com/python/typing: - Collection (only for 3.6) (Issue #27598)