diff -r 9eb5edfcf604 Doc/library/http.client.rst --- a/Doc/library/http.client.rst Tue Aug 09 13:58:10 2016 +1000 +++ b/Doc/library/http.client.rst Tue Aug 09 14:00:57 2016 +0200 @@ -219,39 +219,64 @@ :class:`HTTPConnection` instances have the following methods: -.. method:: HTTPConnection.request(method, url, body=None, headers={}) +.. method:: HTTPConnection.request(method, url, body=None, headers={}, \ + encode_chunked=False) This will send a request to the server using the HTTP request method *method* and the selector *url*. If *body* is specified, the specified data is sent after the headers are - finished. It may be a string, a :term:`bytes-like object`, an open - :term:`file object`, or an iterable of :term:`bytes-like object`\s. If - *body* is a string, it is encoded as ISO-8851-1, the default for HTTP. If - it is a bytes-like object the bytes are sent as is. If it is a :term:`file - object`, the contents of the file is sent; this file object should support - at least the ``read()`` method. If the file object has a ``mode`` - attribute, the data returned by the ``read()`` method will be encoded as - ISO-8851-1 unless the ``mode`` attribute contains the substring ``b``, - otherwise the data returned by ``read()`` is sent as is. If *body* is an - iterable, the elements of the iterable are sent as is until the iterable is - exhausted. + finished. It may be a :class:`str`, a :term:`bytes-like object`, an + open :term:`file object`, or an iterable of :class:`bytes`. If *body* + is a string, it is encoded as ISO-8851-1, the default for HTTP. If it + is a bytes-like object the bytes are sent as is. If it is a :term:`file + object`, the contents of the file is sent; this file object should + support at least the ``read()`` method. If the file object has a + ``mode`` attribute, the data returned by the ``read()`` method will be + encoded as ISO-8851-1 unless the ``mode`` attribute contains the + substring ``b``, otherwise the data returned by ``read()`` is sent as + is. If *body* is an iterable, the elements of the iterable are sent as + is until the iterable is exhausted. - The *headers* argument should be a mapping of extra HTTP - headers to send with the request. + The *headers* argument should be a mapping of extra HTTP headers to send + with the request. - If *headers* does not contain a Content-Length item, one is added - automatically if possible. If *body* is ``None``, the Content-Length header - is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and - ``PATCH``). If *body* is a string or bytes object, the Content-Length - header is set to its length. If *body* is a :term:`file object` and it - works to call :func:`~os.fstat` on the result of its ``fileno()`` method, - then the Content-Length header is set to the ``st_size`` reported by the - ``fstat`` call. Otherwise no Content-Length header is added. + If *headers* contains neither Content-Length nor Transfer-Encoding, a + Content-Length header will be added automatically if possible. If + *body* is ``None``, the Content-Length header is set to ``0`` for + methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If + *body* is a string or bytes-like object, the Content-Length header is + set to its length. If *body* is a binary :term:`file object` + supporting :meth:`~io.IOBase.seek`, this will be used to determine + its size. Otherwise, the Content-Length header is not added + automatically. In cases where determining the Content-Length up + front is not possible, the body will be chunk-encoded and the + Transfer-Encoding header will automatically be set. + + The *encode_chunked* argument is only relevant if Transfer-Encoding is + specified in *headers*. If *encode_chunked* is ``False``, the + HTTPConnection object assumes that all encoding is handled by the + calling code. If it is ``True``, the body will be chunk-encoded. + + .. note:: + Chunked transfer encoding has been added to the HTTP protocol + version 1.1. Unless the HTTP server is known to handle HTTP 1.1, + the caller must either specify the Content-Length or must use a + body representation whose length can be determined automatically. .. versionadded:: 3.2 *body* can now be an iterable. + .. versionchanged:: 3.6 + If neither Content-Length nor Transfer-Encoding are set in + *headers* and Content-Length cannot be determined, *body* will now + be automatically chunk-encoded. The *encode_chunked* argument + was added. + The *headers* parameter now defaults to None to prevent unintended + side effects when :meth:`~request` is called repeatedly with user + supplied headers. The Content-Length for file objects is determined + with seek. + .. method:: HTTPConnection.getresponse() Should be called after a request is sent to get the response from the server. @@ -336,13 +361,31 @@ an argument. -.. method:: HTTPConnection.endheaders(message_body=None) +.. method:: HTTPConnection.endheaders(message_body=None, encode_chunked=False) Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body - associated with the request. The message body will be sent in the same - packet as the message headers if it is string, otherwise it is sent in a - separate packet. + associated with the request. + + If *encode_chunked* is ``True``, the result of each iteration of *data* + will be chunk-encoded as specified in :rfc:`7230`, Section 3.3.1. How + the data is encoded is dependent on the type of *data*. If *data* + implements the :ref:`buffer interface `, or is a + :class:`str`, the encoding will result in a single chunk. If *data* is + a :class:`collections.Iterable`, each iteration of *data* will result in + a chunk. If *data* is a :term:`file object`, each call to ``.read()`` + will result in a chunk. The method automatically signals the end of + the chunk-encoded data immediately after *data*. + + .. note:: Due to the chunked encoding specification, empty chunks + yielded by an iterator body will be ignored by the chunk-encoder. + This is to avoid premature termination of the read of the request by + the target server due to malformed encoding. + + .. versionadded:: 3.6 + Chunked encoding support. The *encode_chunked* parameter was + added. + .. method:: HTTPConnection.send(data) diff -r 9eb5edfcf604 Doc/library/urllib.request.rst --- a/Doc/library/urllib.request.rst Tue Aug 09 13:58:10 2016 +1000 +++ b/Doc/library/urllib.request.rst Tue Aug 09 14:00:57 2016 +0200 @@ -30,18 +30,9 @@ Open the URL *url*, which can be either a string or a :class:`Request` object. - *data* must be a bytes object specifying additional data to be sent to the - server, or ``None`` if no such data is needed. *data* may also be an - iterable object and in that case Content-Length value must be specified in - the headers. Currently HTTP requests are the only ones that use *data*; the - HTTP request will be a POST instead of a GET when the *data* parameter is - provided. - - *data* should be a buffer in the standard - :mimetype:`application/x-www-form-urlencoded` format. The - :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII text string in this format. It should - be encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to be sent to the + server, or ``None`` if no such data is needed. See :class:`Request` + for details. urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests. @@ -192,14 +183,23 @@ *url* should be a string containing a valid URL. - *data* must be a bytes object specifying additional data to send to the - server, or ``None`` if no such data is needed. Currently HTTP requests are - the only ones that use *data*; the HTTP request will be a POST instead of a - GET when the *data* parameter is provided. *data* should be a buffer in the - standard :mimetype:`application/x-www-form-urlencoded` format. - The :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII string in this format. It should be - encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to send to the + server, or ``None`` if no such data is needed. Currently HTTP + requests are the only ones that use *data*. The supported object + types include bytes, string, file-like objects, and iterables. If + no ``Content-Length`` header has been provided, :class:`HTTPHandler` + will try to determine the length of *data* and set this header + accordingly. If this fails, ``Transfer-Encoding: chunked`` as + specified in :rfc:`7230`, Section 3.3.1 will be used to send the + data. See :meth:`http.client.HTTPConnection.request` for details on + the supported object types and on how the content length is + determined. + + For an HTTP POST request method, *data* should be a buffer in the + standard :mimetype:`application/x-www-form-urlencoded` format. The + :func:`urllib.parse.urlencode` function takes a mapping or sequence + of 2-tuples and returns an ASCII string in this format. It should + be encoded to bytes before being used as the *data* parameter. *headers* should be a dictionary, and will be treated as if :meth:`add_header` was called with each key and value as arguments. @@ -211,8 +211,10 @@ :mod:`urllib`'s default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). - An example of using ``Content-Type`` header with *data* argument would be - sending a dictionary like ``{"Content-Type": "application/x-www-form-urlencoded"}``. + An appropriate ``Content-Type`` header should be included if the *data* + argument is present. If this header has not been provided and *data* + is not None, ``Content-Type: application/x-www-form-urlencoded`` will + be added as a default. The final two arguments are only of interest for correct handling of third-party HTTP cookies: @@ -235,15 +237,30 @@ *method* should be a string that indicates the HTTP request method that will be used (e.g. ``'HEAD'``). If provided, its value is stored in the :attr:`~Request.method` attribute and is used by :meth:`get_method()`. - Subclasses may indicate a default method by setting the + The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. + Subclasses may indicate a different default method by setting the :attr:`~Request.method` attribute in the class itself. + .. note:: + The request will not work as expected if the data object is unable + to deliver its content more than once (e.g. a file or an iterable + that can produce the content only once) and the request is retried + for HTTP redirects or authentication. The *data* is sent to the + HTTP server right away after the headers. There is no support for + a 100-continue expectation in the library. + .. versionchanged:: 3.3 :attr:`Request.method` argument is added to the Request class. .. versionchanged:: 3.4 Default :attr:`Request.method` may be indicated at the class level. + .. versionchanged:: 3.6 + Allow all object types in *data* that are supported in + :class:`http.client.HTTPConnection`. + Do not raise an error if the ``Content-Length`` has not been + provided and could not be determined. Fall back to use chunked + transfer encoding instead. .. class:: OpenerDirector() diff -r 9eb5edfcf604 Doc/whatsnew/3.6.rst --- a/Doc/whatsnew/3.6.rst Tue Aug 09 13:58:10 2016 +1000 +++ b/Doc/whatsnew/3.6.rst Tue Aug 09 14:00:57 2016 +0200 @@ -309,6 +309,15 @@ :issue:`23848`.) +http.client +----------- + +* :meth:`~http.client.HTTPConnection.request` and + :meth:`~http.client.HTTPConnection.send` both now support chunked encoding + request bodies. + (Contibuted by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + idlelib and IDLE ---------------- @@ -458,6 +467,18 @@ (Contributed by Amit Saha in :issue:`26323`.) +urllib.request +-------------- + +If a HTTP request has a non empty body but no Content-Length header +and the content length cannot be determined up front, rather than +throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now +falls back to use chunked transfer encoding. As a side effect, this +module makes no more assumptions on the type of the body, so +everything supported by http.client is now allowed. +(Contibuted by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + urllib.robotparser ------------------ diff -r 9eb5edfcf604 Lib/http/client.py --- a/Lib/http/client.py Tue Aug 09 13:58:10 2016 +1000 +++ b/Lib/http/client.py Tue Aug 09 14:00:57 2016 +0200 @@ -795,6 +795,58 @@ auto_open = 1 debuglevel = 0 + @staticmethod + def _is_textIO(stream): + """Test whether a file-like object is a text or a binary stream. + """ + return isinstance(stream.read(0), str) + + @staticmethod + def _get_content_length(body, method): + """Get the content-length based on the body. + + If the body is "empty", we set Content-Length: 0 for methods + that expect a body (RFC 7230, Section 3.3.2). If the body is + set for other methods, we set the header provided we can + figure out what the length is. + """ + if not body: + # do an explicit check for not None here to distinguish + # between unset and set but empty + if method.upper() in _METHODS_EXPECTING_BODY or body is not None: + return 0 + else: + return None + + if hasattr(body, 'read'): + # file-like object. + if HTTPConnection._is_textIO(body): + # text streams are unpredictable because it depends on + # character encoding and line ending translation. + return None + else: + # Is it seekable? + try: + curpos = body.tell() + sz = body.seek(0, io.SEEK_END) + except (TypeError, AttributeError, OSError): + return None + else: + body.seek(curpos) + return sz - curpos + + try: + # does it implement the buffer protocol (bytes, bytearray, array)? + mv = memoryview(body) + return mv.nbytes + except TypeError: + pass + + if isinstance(body, str): + return len(body) + + return None + def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, source_address=None): self.timeout = timeout @@ -933,18 +985,9 @@ if hasattr(data, "read") : if self.debuglevel > 0: print("sendIng a read()able") - encode = False - try: - mode = data.mode - except AttributeError: - # io.BytesIO and other file-like objects don't have a `mode` - # attribute. - pass - else: - if "b" not in mode: - encode = True - if self.debuglevel > 0: - print("encoding file using iso-8859-1") + encode = self._is_textIO(data) + if encode and self.debuglevel > 0: + print("encoding file using iso-8859-1") while 1: datablock = data.read(blocksize) if not datablock: @@ -970,7 +1013,26 @@ """ self._buffer.append(s) - def _send_output(self, message_body=None): + def _read_readable(self, readable): + blocksize = 8192 + if self.debuglevel > 0: + print("sendIng a read()able") + encode = self._is_textIO(readable) + if encode and self.debuglevel > 0: + print("encoding file using iso-8859-1") + while True: + datablock = readable.read(blocksize) + if not datablock: + break + if encode: + datablock = datablock.encode("iso-8859-1") + yield datablock + + def _read_iterable(self, iterable): + for line in iterable: + yield line + + def _send_output(self, message_body=None, encode_chunked=False): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. @@ -979,10 +1041,51 @@ self._buffer.extend((b"", b"")) msg = b"\r\n".join(self._buffer) del self._buffer[:] + self.send(msg) - self.send(msg) if message_body is not None: - self.send(message_body) + + # create a consistent interface to the message_body + if hasattr(message_body, 'read'): + # Let file-like take precedence over byte-like. This + # is needed to allow the current position of mmap'ed + # files to be taken into account. + read = self._read_readable + else: + try: + # this is solely to check to see if message_body + # implements the buffer API. it /would/ be easier + # to capture if PyObject_CheckBuffer was exposed + # to Python. + memoryview(message_body) + except TypeError: + if isinstance(message_body, str): + read = lambda data: (_encode(data),) + elif isinstance(message_body, collections.Iterable): + read = self._read_iterable + else: + raise TypeError("message_body should be a bytes-like " + "object or an iterable, got %r" + % type(message_body)) + else: + # the object implements the buffer interface and + # can be passed directly into socket methods + read = lambda data: (data,) + + for line in read(message_body): + if not line: + if self.debuglevel > 0: + print('Zero length line ignored') + continue + + if encode_chunked and self._http_vsn == 11: + # chunked encoding + line = f'{len(line):X}\r\n'.encode('ascii') + line + b'\r\n' + self.send(line) + + if encode_chunked and self._http_vsn == 11: + # end chunked transfer + self.send(b'0\r\n\r\n') def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0): """Send a request to the server. @@ -1135,52 +1238,27 @@ header = header + b': ' + value self._output(header) - def endheaders(self, message_body=None): + def endheaders(self, message_body=None, encode_chunked=False): """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body argument can be used to pass a message body associated with the - request. The message body will be sent in the same packet as the - message headers if it is a string, otherwise it is sent as a separate - packet. + request. """ if self.__state == _CS_REQ_STARTED: self.__state = _CS_REQ_SENT else: raise CannotSendHeader() - self._send_output(message_body) + self._send_output(message_body, encode_chunked=encode_chunked) - def request(self, method, url, body=None, headers={}): + def request(self, method, url, body=None, headers={}, + encode_chunked=False): """Send a complete request to the server.""" - self._send_request(method, url, body, headers) + self._send_request(method, url, body, headers, encode_chunked) - def _set_content_length(self, body, method): - # Set the content-length based on the body. If the body is "empty", we - # set Content-Length: 0 for methods that expect a body (RFC 7230, - # Section 3.3.2). If the body is set for other methods, we set the - # header provided we can figure out what the length is. - thelen = None - method_expects_body = method.upper() in _METHODS_EXPECTING_BODY - if body is None and method_expects_body: - thelen = '0' - elif body is not None: - try: - thelen = str(len(body)) - except TypeError: - # If this is a file-like object, try to - # fstat its file descriptor - try: - thelen = str(os.fstat(body.fileno()).st_size) - except (AttributeError, OSError): - # Don't send a length if this failed - if self.debuglevel > 0: print("Cannot stat!!") - - if thelen is not None: - self.putheader('Content-Length', thelen) - - def _send_request(self, method, url, body, headers): + def _send_request(self, method, url, body, headers, encode_chunked): # Honor explicitly requested Host: and Accept-Encoding: headers. - header_names = dict.fromkeys([k.lower() for k in headers]) + header_names = {k.lower(): k for k in headers} skips = {} if 'host' in header_names: skips['skip_host'] = 1 @@ -1189,15 +1267,73 @@ self.putrequest(method, url, **skips) + # chunked encoding will happen if HTTP/1.1 is used and either + # the caller passes encode_chunked=True or the following + # conditions hold: + # 1. content-length has not been explicitly set + # 2. the length of the body cannot be determined + # (e.g. it is a generator or a not seekable file) + # 3. Transfer-Encoding has NOT been explicitly set by the caller + if 'content-length' not in header_names: - self._set_content_length(body, method) + # only chunk body if not explicitly set for backwards + # compatibility, assuming the client code is already handling the + # chunking + if 'transfer-encoding' not in header_names: + # if content-length cannot be automatically determined, fall + # back to chunked encoding + encode_chunked = False + content_length = self._get_content_length(body, method) + if content_length is None: + if body: + if self.debuglevel > 0: + print('Unable to determine size of %r' % body) + encode_chunked = True + self.putheader('Transfer-Encoding', 'chunked') + else: + self.putheader('Content-Length', str(content_length)) + else: + # transfer-encoding is specified, do some validation + + # RFC 7230, Section 3.3.1 + # A sender MUST NOT apply chunked more than once to a + # message body (i.e., chunking an already chunked message + # is not allowed). + enc = headers[header_names['transfer-encoding']].split(',') + if len([e for e in enc if e == 'chunked']) > 1: + raise ValueError( + 'Multiple chunked encodings found. Expected 1.') + + # RFC 7230, Section 3.3.1 + # If any transfer coding other than + # chunked is applied to a request payload body, the sender + # MUST apply chunked as the final transfer coding to ensure + # that the message is properly framed. + if enc[-1] != 'chunked': + raise ValueError( + 'Chunked encoding expected as the final ' + 'Transfer-Encoding.') + else: + # content-length is specified. + + # RFC 7230, Section 3.3.2 + # A sender MUST NOT send a Content-Length header field in + # any message that contains a Transfer-Encoding header + # field. + if 'transfer-encoding' in header_names: + raise ValueError( + 'Content-Length and Transfer-Encoding ' + 'must not both be set.') + else: + encode_chunked = False + for hdr, value in headers.items(): self.putheader(hdr, value) if isinstance(body, str): # RFC 2616 Section 3.7.1 says that text default has a # default charset of iso-8859-1. body = _encode(body, 'body') - self.endheaders(body) + self.endheaders(body, encode_chunked) def getresponse(self): """Get the response from the server. diff -r 9eb5edfcf604 Lib/test/test_httplib.py --- a/Lib/test/test_httplib.py Tue Aug 09 13:58:10 2016 +1000 +++ b/Lib/test/test_httplib.py Tue Aug 09 14:00:57 2016 +0200 @@ -314,6 +314,125 @@ conn.putheader(name, value) +class TransferEncodingTest(TestCase): + expected_body = b"It's just a flesh wound" + + def test_chunked(self): + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.send(self._make_body(), encode_chunked=True) + + body = self._parse_chunked(conn.sock.data) + self.assertEqual(body, self.expected_body) + + def test_explicit_headers(self): + # explicit chunked + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + # this shouldn't actually be automatically chunk-encoded because the + # calling code has explicitly stated that it's taking care of it + conn.request( + 'POST', '/', self._make_body(), {'Transfer-Encoding': 'chunked'}) + + _, headers, body = self._parse_request(conn.sock.data) + self.assertNotIn('content-length', [k.lower() for k in headers.keys()]) + self.assertEqual(headers['Transfer-Encoding'], 'chunked') + self.assertEqual(body, self.expected_body) + + # explicit chunked, string body + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.request( + 'POST', '/', self.expected_body.decode('latin-1'), + {'Transfer-Encoding': 'chunked'}) + + _, headers, body = self._parse_request(conn.sock.data) + self.assertNotIn('content-length', [k.lower() for k in headers.keys()]) + self.assertEqual(headers['Transfer-Encoding'], 'chunked') + self.assertEqual(body, self.expected_body) + + # invalid ordering + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + with self.assertRaises(ValueError): + conn.request( + 'POST', '/', self._make_body(), + {'Transfer-Encoding': 'chunked,gzip'}) + + # multiple chunk-encodings found + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + with self.assertRaises(ValueError): + conn.request( + 'POST', '/', self._make_body(), + {'Transfer-Encoding': 'chunked,gzip,chunked'}) + + def test_request(self): + for val in (False, True,): + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.request( + 'POST', '/', self._make_body(empty_lines=val)) + + _, headers, body = self._parse_request(conn.sock.data) + body = self._parse_chunked(body) + self.assertEqual(body, self.expected_body) + + # Content-Length and Transfer-Encoding SHOULD not be sent in the + # same request + self.assertNotIn( + b'content-length', [h.lower() for h in headers.keys()]) + + def _make_body(self, empty_lines=False): + lines = self.expected_body.split(b' ') + for idx, line in enumerate(lines): + # for testing handling empty lines + if empty_lines and idx % 2: + yield b'' + if idx < len(lines) - 1: + yield line + b' ' + else: + yield line + + def _parse_request(self, data): + lines = data.split(b'\r\n') + request = lines[0] + headers = {} + n = 1 + while n < len(lines) and len(lines[n]) > 0: + key, val = lines[n].split(b':') + headers[key.decode('latin-1')] = val.decode('latin-1').strip() + n += 1 + + return request, headers, b'\r\n'.join(lines[n + 1:]) + + def _parse_chunked(self, data): + body = [] + trailers = {} + n = 0 + lines = data.split(b'\r\n') + # parse body + while True: + size, chunk = lines[n:n+2] + size = int(size, 16) + + if size == 0: + n += 1 + break + + self.assertEqual(size, len(chunk)) + body.append(chunk) + + n += 2 + # we /should/ hit the end chunk, but check against the size of + # lines so we're not stuck in an infinite loop should we get + # malformed data + if n > len(lines): + break + + return b''.join(body) + + class BasicTest(TestCase): def test_status_lines(self): # Test HTTP status lines @@ -565,10 +684,20 @@ yield 'data_two' class UpdatingFile(): - mode = 'r' - d = data() - def read(self, blocksize=-1): - return self.d.__next__() + def __init__(self): + self.mode = 'r' + self.d = data() + self.buffer = '' + def read(self, size=-1): + if not self.buffer: + self.buffer = next(self.d, None) + if self.buffer is None: + return None + if size < 0 or size > len(self.buffer): + size = len(self.buffer) + res = self.buffer[:size] + self.buffer = self.buffer[size:] + return res expected = b'data' @@ -1535,6 +1664,26 @@ message = client.parse_headers(f) return message, f + def test_list_body(self): + # Note that no content-length is automatically calculated for + # an iterable. The request will fall back to send chunked + # transfer encoding. + cases = ( + ([b'foo', b'bar'], b'3\r\nfoo\r\n3\r\nbar\r\n0\r\n\r\n'), + ((b'foo', b'bar'), b'3\r\nfoo\r\n3\r\nbar\r\n0\r\n\r\n'), + ) + for body, expected in cases: + with self.subTest(body): + self.conn = client.HTTPConnection('example.com') + self.conn.sock = self.sock = FakeSocket('') + + self.conn.request('PUT', '/url', body) + msg, f = self.get_headers_and_fp() + self.assertNotIn('Content-Type', msg) + self.assertIsNone(msg.get_charset()) + self.assertEqual(msg.get('Transfer-Encoding'), 'chunked') + self.assertEqual(expected, f.read()) + def test_manual_content_length(self): # Set an incorrect content-length so that we can verify that # it will not be over-ridden by the library. @@ -1577,8 +1726,13 @@ message, f = self.get_headers_and_fp() self.assertEqual("text/plain", message.get_content_type()) self.assertIsNone(message.get_charset()) - self.assertEqual("4", message.get("content-length")) - self.assertEqual(b'body', f.read()) + # Note that the length of text files is unpredictable + # because it depends on character encoding and line ending + # translation. No content-length will be set, the body + # will be sent using chunked transfer encoding. + self.assertIsNone(message.get("content-length")) + self.assertEqual("chunked", message.get("transfer-encoding")) + self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read()) def test_binary_file_body(self): self.addCleanup(support.unlink, support.TESTFN) diff -r 9eb5edfcf604 Lib/test/test_urllib2.py --- a/Lib/test/test_urllib2.py Tue Aug 09 13:58:10 2016 +1000 +++ b/Lib/test/test_urllib2.py Tue Aug 09 14:00:57 2016 +0200 @@ -7,6 +7,8 @@ import socket import array import sys +import tempfile +import subprocess import urllib.request # The proxy bypass method imported below has logic specific to the OSX @@ -335,7 +337,8 @@ else: self._tunnel_headers.clear() - def request(self, method, url, body=None, headers=None): + def request(self, method, url, body=None, headers=None, + encode_chunked=False): self.method = method self.selector = url if headers is not None: @@ -343,6 +346,7 @@ self.req_headers.sort() if body: self.data = body + self.encode_chunked = encode_chunked if self.raise_on_endheaders: raise OSError() @@ -908,7 +912,75 @@ self.assertEqual(req.unredirected_hdrs["Host"], "baz") self.assertEqual(req.unredirected_hdrs["Spam"], "foo") - # Check iterable body support + def test_http_body_file(self): + # A regular file - Content Length is calculated unless already set. + + h = urllib.request.AbstractHTTPHandler() + o = h.parent = MockOpener() + + file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False) + file_path = file_obj.name + file_obj.write(b"Something\nSomething\nSomething\n") + file_obj.close() + + for headers in {}, {"Content-Length": 30}: + with open(file_path, "rb") as f: + req = Request("http://example.com/", f, headers) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + + os.unlink(file_path) + + def test_http_body_fileobj(self): + # A file object - Content Length is calculated unless already set. + # (Note that there are some subtle differences to a regular + # file, that is why we are testing both cases.) + + h = urllib.request.AbstractHTTPHandler() + o = h.parent = MockOpener() + + file_obj = io.BytesIO() + file_obj.write(b"Something\nSomething\nSomething\n") + + for headers in {}, {"Content-Length": 30}: + file_obj.seek(0) + req = Request("http://example.com/", file_obj, headers) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + + file_obj.close() + + def test_http_body_pipe(self): + # A file reading from a pipe. + # A pipe cannot be seek'ed. There is no way to determine the + # content length up front. Thus, do_request_() should fall + # back to Transfer-encoding chunked. + + h = urllib.request.AbstractHTTPHandler() + o = h.parent = MockOpener() + + cmd = [sys.executable, "-c", + r"import sys; " + r"sys.stdout.buffer.write(b'Something\nSomething\nSomething\n')"] + for headers in {}, {"Content-Length": 30}: + with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc: + req = Request("http://example.com/", proc.stdout, headers) + newreq = h.do_request_(req) + if not headers: + self.assertEqual(newreq.get_header('Content-length'), None) + self.assertEqual(newreq.get_header('Transfer-encoding'), + 'chunked') + else: + self.assertEqual(int(newreq.get_header('Content-length')), + 30) + + def test_http_body_iterable(self): + # Generic iterable. There is no way to determine the content + # length up front. Fall back to Transfer-encoding chunked. + + h = urllib.request.AbstractHTTPHandler() + o = h.parent = MockOpener() + def iterable_body(): yield b"one" yield b"two" @@ -916,32 +988,19 @@ for headers in {}, {"Content-Length": 11}: req = Request("http://example.com/", iterable_body(), headers) + newreq = h.do_request_(req) if not headers: - # Having an iterable body without a Content-Length should - # raise an exception - self.assertRaises(ValueError, h.do_request_, req) + self.assertEqual(newreq.get_header('Content-length'), None) + self.assertEqual(newreq.get_header('Transfer-encoding'), + 'chunked') else: - newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 11) - # A file object. - # Test only Content-Length attribute of request. + def test_http_body_array(self): + # array.array Iterable - Content Length is calculated - file_obj = io.BytesIO() - file_obj.write(b"Something\nSomething\nSomething\n") - - for headers in {}, {"Content-Length": 30}: - req = Request("http://example.com/", file_obj, headers) - if not headers: - # Having an iterable body without a Content-Length should - # raise an exception - self.assertRaises(ValueError, h.do_request_, req) - else: - newreq = h.do_request_(req) - self.assertEqual(int(newreq.get_header('Content-length')), 30) - - file_obj.close() - - # array.array Iterable - Content Length is calculated + h = urllib.request.AbstractHTTPHandler() + o = h.parent = MockOpener() iterable_array = array.array("I",[1,2,3,4]) diff -r 9eb5edfcf604 Lib/urllib/request.py --- a/Lib/urllib/request.py Tue Aug 09 13:58:10 2016 +1000 +++ b/Lib/urllib/request.py Tue Aug 09 14:00:57 2016 +0200 @@ -141,17 +141,9 @@ *, cafile=None, capath=None, cadefault=False, context=None): '''Open the URL url, which can be either a string or a Request object. - *data* must be a bytes object specifying additional data to be sent to the - server, or None if no such data is needed. data may also be an iterable - object and in that case Content-Length value must be specified in the - headers. Currently HTTP requests are the only ones that use data; the HTTP - request will be a POST instead of a GET when the data parameter is - provided. - - *data* should be a buffer in the standard application/x-www-form-urlencoded - format. The urllib.parse.urlencode() function takes a mapping or sequence - of 2-tuples and returns an ASCII text string in this format. It should be - encoded to bytes before being used as the data parameter. + *data* must be an object specifying additional data to be sent to + the server, or None if no such data is needed. See Request for + details. urllib.request module uses HTTP/1.1 and includes a "Connection:close" header in its HTTP requests. @@ -1235,6 +1227,11 @@ def set_http_debuglevel(self, level): self._debuglevel = level + def _get_content_length(self, request): + return http.client.HTTPConnection._get_content_length( + request.data, + request.get_method()) + def do_request_(self, request): host = request.host if not host: @@ -1250,17 +1247,15 @@ request.add_unredirected_header( 'Content-type', 'application/x-www-form-urlencoded') - if not request.has_header('Content-length'): - try: - mv = memoryview(data) - except TypeError: - if isinstance(data, collections.Iterable): - raise ValueError("Content-Length should be specified " - "for iterable data of type %r %r" % (type(data), - data)) + if (not request.has_header('Content-length') + and not request.has_header('Transfer-encoding')): + content_length = self._get_content_length(request) + if content_length is not None: + request.add_unredirected_header( + 'Content-length', str(content_length)) else: request.add_unredirected_header( - 'Content-length', '%d' % (len(mv) * mv.itemsize)) + 'Transfer-encoding', 'chunked') sel_host = host if request.has_proxy(): @@ -1316,7 +1311,8 @@ try: try: - h.request(req.get_method(), req.selector, req.data, headers) + h.request(req.get_method(), req.selector, req.data, headers, + encode_chunked=req.has_header('Transfer-encoding')) except OSError as err: # timeout error raise URLError(err) r = h.getresponse() diff -r 9eb5edfcf604 Misc/NEWS --- a/Misc/NEWS Tue Aug 09 13:58:10 2016 +1000 +++ b/Misc/NEWS Tue Aug 09 14:00:57 2016 +0200 @@ -93,6 +93,16 @@ - Issue 26988: Add AutoEnum. +- Issue #12319: Chunked transfer encoding support added to + http.client.HTTPConnection requests. + urllib.request.AbstractHTTPHandler does not enforce a Content-Length + header any more. If a HTTP request has a non-empty body, but no + Content-Length header, and the content length cannot be determined + up front, rather then throwing an error, this class now falls back + to use chunked transfer encoding. As a side effect, this class + makes no more assumptions on the type of the body, so everything + supported by http.client is now allowed. + Tests -----