diff -r 2126e8cbc12f Doc/library/http.client.rst --- a/Doc/library/http.client.rst Tue Jun 14 15:25:36 2016 +0300 +++ b/Doc/library/http.client.rst Tue Jun 14 16:42:05 2016 +0200 @@ -190,6 +190,15 @@ Previously, :exc:`BadStatusLine`\ ``('')`` was raised. +.. exception:: EncodingError + + A subclass of :exc:`HTTPException`. Raised if HTTP-specific encoding errors + are encountered, such as multiple "chunked" entries in a "Transfer-Encoding" + header. + + .. versionadded:: 3.6 + + The constants defined in this module are: .. data:: HTTP_PORT @@ -219,39 +228,54 @@ :class:`HTTPConnection` instances have the following methods: -.. method:: HTTPConnection.request(method, url, body=None, headers={}) +.. method:: HTTPConnection.request(method, url, body=None, headers=None, \ + encode_chunked=False) This will send a request to the server using the HTTP request method *method* and the selector *url*. - If *body* is specified, the specified data is sent after the headers are - finished. It may be a string, a :term:`bytes-like object`, an open - :term:`file object`, or an iterable of :term:`bytes-like object`\s. If - *body* is a string, it is encoded as ISO-8851-1, the default for HTTP. If - it is a bytes-like object the bytes are sent as is. If it is a :term:`file - object`, the contents of the file is sent; this file object should support - at least the ``read()`` method. If the file object has a ``mode`` - attribute, the data returned by the ``read()`` method will be encoded as - ISO-8851-1 unless the ``mode`` attribute contains the substring ``b``, - otherwise the data returned by ``read()`` is sent as is. If *body* is an - iterable, the elements of the iterable are sent as is until the iterable is - exhausted. + If the *body* argument is present, it can be any of the following + types: *string*, *bytes*, iterables comprised of either *string* or + *bytes*, :term:`file object` (the object should support the ``read()`` + method), or objects implementing the :ref:`buffer interface + ` such as :class:`array.array`. - The *headers* argument should be a mapping of extra HTTP - headers to send with the request. + Unencoded *string* objects are encoded as ISO-8859-1 (latin-1), the + default charset for HTTP. To use other encodings, *bytes* objects must + be used. - If *headers* does not contain a Content-Length item, one is added - automatically if possible. If *body* is ``None``, the Content-Length header - is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and - ``PATCH``). If *body* is a string or bytes object, the Content-Length - header is set to its length. If *body* is a :term:`file object` and it - works to call :func:`~os.fstat` on the result of its ``fileno()`` method, - then the Content-Length header is set to the ``st_size`` reported by the - ``fstat`` call. Otherwise no Content-Length header is added. + The *headers* argument should be a mapping of extra HTTP headers to + send with the request. + + If *headers* does contain neither Content-Length nor Transfer-Encoding, + a Content-Length header will be added automatically if possible. If + *body* is ``None``, the Content-Length header is set to ``0`` for + methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If + *body* is a string or bytes object, the Content-Length header is set to + its length. If *body* supports the buffer interface, the length is + calculated with the help of :class:`memoryview`. If *body* is a + :term:`file object` supporting :meth:`~io.IOBase.seek`, this will be + used to determine its size. Otherwise, the Content-Length header is + not added automatically. In cases where determining the Content-Length + up front is not possible, the body will be chunk encoded and the + Transfer-Encoding header will automatically be set. + + The *encode_chunked* argument is only relevant, if Transfer-Encoding + is specified in *headers*. If *encode_chunked* is ``False``, the + client assumes that all encoding is handled by the calling code. If it + is ``True``, the body will be chunk encoded. .. versionadded:: 3.2 *body* can now be an iterable. + .. versionadded:: 3.6 + If neither Content-Length nor Transfer-Encoding are set in + headers and Content-Length cannot be determined, *body* will now + be automatically chunk encoded. The *encode_chunked* argument + was added. + *headers* now defaults to None to prevent unintended side effects when + :meth:`~request` is called repeatedly with user supplied headers. + .. method:: HTTPConnection.getresponse() Should be called after a request is sent to get the response from the server. @@ -336,7 +360,7 @@ an argument. -.. method:: HTTPConnection.endheaders(message_body=None) +.. method:: HTTPConnection.endheaders(message_body=None, encode_chunked=False) Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body @@ -344,12 +368,37 @@ packet as the message headers if it is string, otherwise it is sent in a separate packet. -.. method:: HTTPConnection.send(data) + The *encode_chunked* flag is passed directly into :meth:`send` (see + :meth:`send` documentation for details). + + .. versionadded:: 3.6 + The *encode_chunked* parameter was added. + + +.. method:: HTTPConnection.send(data, encode_chunked=False) Send data to the server. This should be used directly only after the :meth:`endheaders` method has been called and before :meth:`getresponse` is called. + If *encode_chunked* is ``True``, the result of each iteration of *data* will + be chunk encoded as specified in :rfc:`7230`, Section 3.3.1. How the data is + encoded is dependent on the type of *data*. If *data* implements the + :ref:`buffer interface `, or is a :class:`str`, the encoding + will result in a single chunk. If *data* is a :class:`collections.Iterable`, + each iteration of *data* will result in a chunk. If *data* is a + :term:`file object`, each call to ``.read()`` will result in a chunk. + :meth:`send` automatically signals the end of the chunk encoded data + immediately after *data*. + + .. note:: Due to the chunked encoding spec, empty chunks yielded by an + iterator body will be ignored by the chunk encoder. This is to avoid + premature termination of the read of the request by the target server due + to malformed encoding. + + .. versionadded:: 3.6 + Chunked encoding support. + .. _httpresponse-objects: diff -r 2126e8cbc12f Doc/library/urllib.request.rst --- a/Doc/library/urllib.request.rst Tue Jun 14 15:25:36 2016 +0300 +++ b/Doc/library/urllib.request.rst Tue Jun 14 16:42:05 2016 +0200 @@ -30,18 +30,9 @@ Open the URL *url*, which can be either a string or a :class:`Request` object. - *data* must be a bytes object specifying additional data to be sent to the - server, or ``None`` if no such data is needed. *data* may also be an - iterable object and in that case Content-Length value must be specified in - the headers. Currently HTTP requests are the only ones that use *data*; the - HTTP request will be a POST instead of a GET when the *data* parameter is - provided. - - *data* should be a buffer in the standard - :mimetype:`application/x-www-form-urlencoded` format. The - :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII text string in this format. It should - be encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to be sent to the + server, or ``None`` if no such data is needed. See :class:`Request` + for details. urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests. @@ -182,14 +173,21 @@ *url* should be a string containing a valid URL. - *data* must be a bytes object specifying additional data to send to the - server, or ``None`` if no such data is needed. Currently HTTP requests are - the only ones that use *data*; the HTTP request will be a POST instead of a - GET when the *data* parameter is provided. *data* should be a buffer in the - standard :mimetype:`application/x-www-form-urlencoded` format. - The :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII string in this format. It should be - encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to send to the + server, or ``None`` if no such data is needed. The supported + object types include bytes, string, file like objects, and + iterables. if no ``Content-Length`` header has been provided, + :class:`HTTPHandler` will try to determine the length of *data* and + set this header accordingly. If this fails, ``Transfer-Encoding: + chunked`` as specified in :rfc:`7230`, Section 3.3.1 will be used + to send the data. + + Currently HTTP requests are the only ones that use *data*. For a + POST request method, *data* should be a buffer in the standard + :mimetype:`application/x-www-form-urlencoded` format. The + :func:`urllib.parse.urlencode` function takes a mapping or sequence + of 2-tuples and returns an ASCII string in this format. It should + be encoded to bytes before being used as the *data* parameter. *headers* should be a dictionary, and will be treated as if :meth:`add_header` was called with each key and value as arguments. @@ -234,6 +232,10 @@ .. versionchanged:: 3.4 Default :attr:`Request.method` may be indicated at the class level. + .. versionadded:: 3.6 + Do not raise an error if the ``Content-Length`` has not been + provided and could not be determined. Fall back to use + ``Transfer-Encoding: chunked`` instead. .. class:: OpenerDirector() diff -r 2126e8cbc12f Doc/whatsnew/3.6.rst --- a/Doc/whatsnew/3.6.rst Tue Jun 14 15:25:36 2016 +0300 +++ b/Doc/whatsnew/3.6.rst Tue Jun 14 16:42:05 2016 +0200 @@ -291,6 +291,15 @@ In compensation, the eventual result with be that some idlelib classes will be easier to use, with better APIs and docstrings explaining them. Additional useful information will be added to idlelib when available. +http.client +----------- + +* :meth:`~http.client.HTTPConnection.request` and + :meth:`~http.client.HTTPConnection.send` both now support chunked encoding + request bodies. + (Contibuted by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + os -- @@ -401,6 +410,18 @@ (Contributed by Nikolay Bogoychev in :issue:`16099`.) +urllib.request +-------------- + +If a HTTP request has a non empty body but no Content-Length header +and the content length cannot be determined up front, rather then +throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now +falls back to use chunked transfer encoding. As a side effect, this +module makes no more assumptions on the type of the body, so +everything supported by http.client is now allowed. +(Contibuted by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + warnings -------- diff -r 2126e8cbc12f Lib/http/client.py --- a/Lib/http/client.py Tue Jun 14 15:25:36 2016 +0300 +++ b/Lib/http/client.py Tue Jun 14 16:42:05 2016 +0200 @@ -86,7 +86,7 @@ "IncompleteRead", "InvalidURL", "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader", "ResponseNotReady", "BadStatusLine", "LineTooLong", "RemoteDisconnected", "error", - "responses"] + "responses", "EncodingError"] HTTP_PORT = 80 HTTPS_PORT = 443 @@ -98,6 +98,7 @@ _CS_REQ_STARTED = 'Request-started' _CS_REQ_SENT = 'Request-sent' +_DEFAULT_ENCODING = 'latin-1' # hack to maintain backwards compatibility globals().update(http.HTTPStatus.__members__) @@ -795,6 +796,57 @@ auto_open = 1 debuglevel = 0 + @staticmethod + def get_content_length(body, method): + """Get the content-length based on the body. + + If the body is "empty", we set Content-Length: 0 for methods + that expect a body (RFC 7230, Section 3.3.2). If the body is + set for other methods, we set the header provided we can + figure out what the length is. + """ + if not body: + # do an explicit check for not None here to distinguish + # between unset and set but empty + if method.upper() in _METHODS_EXPECTING_BODY or body is not None: + return 0 + else: + return None + + try: + # does it implement the buffer protocol (bytes, bytearray, array)? + mv = memoryview(body) + return len(mv) * mv.itemsize + except TypeError: + pass + + if isinstance(body, str): + return len(body) + + if hasattr(body, 'read'): + # file like object. Is it seekable? + try: + curpos = body.tell() + sz = body.seek(0, io.SEEK_END) + except (TypeError, AttributeError, OSError): + return None + else: + body.seek(curpos) + return sz - curpos + + if isinstance(body, collections.Sequence): + # A sequence. Assume it to be sequence of bytes or str. + # Note that we do not allow generic Iterable here, because + # there is no guarantee that it can produce the same + # sequence twice. But for a Sequence that even supports + # random access, this should be ok. + try: + return sum(len(line) for line in body) + except TypeError: + return None + + return None + def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, source_address=None): self.timeout = timeout @@ -915,7 +967,37 @@ self.__response = None response.close() - def send(self, data): + def _read_readable(self, readable): + blocksize = 8192 + if self.debuglevel > 0: + print("sendIng a read()able") + encode = False + try: + mode = readable.mode + except AttributeError: + # io.BytesIO and other file-like objects don't have a `mode` + # attribute. + pass + else: + if "b" not in mode: + encode = True + if self.debuglevel > 0: + print("encoding file using iso-8859-1") + while True: + datablock = readable.read(blocksize) + if not datablock: + break + if encode: + datablock = datablock.encode(_DEFAULT_ENCODING) + yield datablock + + def _read_iterable(self, iterable): + for line in iterable: + if isinstance(line, str): + line = line.encode(_DEFAULT_ENCODING) + yield line + + def send(self, data, encode_chunked=False): """Send `data' to the server. ``data`` can be a string object, a bytes object, an array object, a file-like object that supports a .read() method, or an iterable object. @@ -929,39 +1011,45 @@ if self.debuglevel > 0: print("send:", repr(data)) - blocksize = 8192 - if hasattr(data, "read") : - if self.debuglevel > 0: - print("sendIng a read()able") - encode = False - try: - mode = data.mode - except AttributeError: - # io.BytesIO and other file-like objects don't have a `mode` - # attribute. - pass - else: - if "b" not in mode: - encode = True - if self.debuglevel > 0: - print("encoding file using iso-8859-1") - while 1: - datablock = data.read(blocksize) - if not datablock: - break - if encode: - datablock = datablock.encode("iso-8859-1") - self.sock.sendall(datablock) - return + + # create a consistent interface to the data try: - self.sock.sendall(data) + # this is solely to check to see if data implements the buffer API. + # it /would/ be easier to capture if PyObject_CheckBuffer was + # exposed to Python + memoryview(data) except TypeError: - if isinstance(data, collections.Iterable): - for d in data: - self.sock.sendall(d) + if isinstance(data, str): + read = lambda data: (data.encode(_DEFAULT_ENCODING),) + elif hasattr(data, 'read'): + read = self._read_readable + elif isinstance(data, collections.Iterable): + read = self._read_iterable else: raise TypeError("data should be a bytes-like object " "or an iterable, got %r" % type(data)) + else: + # the object implements the buffer interface and can be passed + # directly into socket methods + read = lambda data: (data,) + + for line in read(data): + if not line: + if self.debuglevel > 0: + print('Zero length line ignored') + continue + + if encode_chunked and self._http_vsn == 11: + # chunked encoding + line = b'\r\n'.join(( + format(len(line), 'X').encode('ascii'), + line, + b'')) + self.sock.sendall(line) + + if encode_chunked and self._http_vsn == 11: + # end chunked transfer + self.sock.sendall(b'0\r\n\r\n') def _output(self, s): """Add a line of output to the current request buffer. @@ -970,7 +1058,7 @@ """ self._buffer.append(s) - def _send_output(self, message_body=None): + def _send_output(self, message_body=None, encode_chunked=False): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. @@ -982,7 +1070,7 @@ self.send(msg) if message_body is not None: - self.send(message_body) + self.send(message_body, encode_chunked=encode_chunked) def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0): """Send a request to the server. @@ -1135,7 +1223,7 @@ header = header + b': ' + value self._output(header) - def endheaders(self, message_body=None): + def endheaders(self, message_body=None, encode_chunked=False): """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body @@ -1148,39 +1236,16 @@ self.__state = _CS_REQ_SENT else: raise CannotSendHeader() - self._send_output(message_body) + self._send_output(message_body, encode_chunked=encode_chunked) - def request(self, method, url, body=None, headers={}): + def request(self, method, url, body=None, headers=None, + encode_chunked=False): """Send a complete request to the server.""" - self._send_request(method, url, body, headers) + self._send_request(method, url, body, headers or {}, encode_chunked) - def _set_content_length(self, body, method): - # Set the content-length based on the body. If the body is "empty", we - # set Content-Length: 0 for methods that expect a body (RFC 7230, - # Section 3.3.2). If the body is set for other methods, we set the - # header provided we can figure out what the length is. - thelen = None - method_expects_body = method.upper() in _METHODS_EXPECTING_BODY - if body is None and method_expects_body: - thelen = '0' - elif body is not None: - try: - thelen = str(len(body)) - except TypeError: - # If this is a file-like object, try to - # fstat its file descriptor - try: - thelen = str(os.fstat(body.fileno()).st_size) - except (AttributeError, OSError): - # Don't send a length if this failed - if self.debuglevel > 0: print("Cannot stat!!") - - if thelen is not None: - self.putheader('Content-Length', thelen) - - def _send_request(self, method, url, body, headers): + def _send_request(self, method, url, body, headers, encode_chunked): # Honor explicitly requested Host: and Accept-Encoding: headers. - header_names = dict.fromkeys([k.lower() for k in headers]) + header_names = {k.lower(): k for k in headers.keys()} skips = {} if 'host' in header_names: skips['skip_host'] = 1 @@ -1189,15 +1254,69 @@ self.putrequest(method, url, **skips) + # chunked encoding will happen if HTTP/1.1 is used and either + # the caller passes encode_chunked=True or the following + # conditions hold: + # 1. content-length has not been explicitly set + # 2. the length of the body cannot be determined + # (e.g. it is a generator or a not seekable file) + # 3. Transfer-Encoding has NOT been explicitly set by the caller + if 'content-length' not in header_names: - self._set_content_length(body, method) + # only chunk body if not explicitly set for backwards + # compatibility, assuming the client code is already handling the + # chunking + if 'transfer-encoding' not in header_names: + # if content-length cannot be automatically determined, fall + # back to chunked encoding + encode_chunked = False + content_length = self.get_content_length(body, method) + if content_length is None: + if body: + if self.debuglevel > 0: + print('Unable to determine size of %r' % body) + encode_chunked = True + self.putheader('Transfer-Encoding', 'chunked') + else: + self.putheader('Content-Length', str(content_length)) + else: + # transfer-encoding is specified, do some validation + + # RFC 7230, Section 3.3.1 + # A sender MUST NOT apply chunked more than once to a + # message body (i.e., chunking an already chunked message + # is not allowed). + enc = headers[header_names['transfer-encoding']].split(',') + if len([e for e in enc if e == 'chunked']) > 1: + raise EncodingError( + 'Multiple chunked encodings found. Expected 1.') + + # RFC 7230, Section 3.3.1 + # If any transfer coding other than + # chunked is applied to a request payload body, the sender + # MUST apply chunked as the final transfer coding to ensure + # that the message is properly framed. + if enc[-1] != 'chunked': + raise EncodingError( + 'Chunked encoding expected as the final ' + 'Transfer-Encoding.') + else: + # content-length is specified. + + # RFC 7230, Section 3.3.2 + # A sender MUST NOT send a Content-Length header field in + # any message that contains a Transfer-Encoding header + # field. + if 'transfer-encoding' in header_names: + raise EncodingError( + 'Content-Length and Transfer-Encoding ' + 'must not both be set.') + else: + encode_chunked = False + for hdr, value in headers.items(): self.putheader(hdr, value) - if isinstance(body, str): - # RFC 2616 Section 3.7.1 says that text default has a - # default charset of iso-8859-1. - body = _encode(body, 'body') - self.endheaders(body) + self.endheaders(body, encode_chunked) def getresponse(self): """Get the response from the server. @@ -1383,5 +1502,10 @@ BadStatusLine.__init__(self, "") ConnectionResetError.__init__(self, *pos, **kw) + +class EncodingError(HTTPException): + pass + + # for backwards compatibility error = HTTPException diff -r 2126e8cbc12f Lib/test/test_httplib.py --- a/Lib/test/test_httplib.py Tue Jun 14 15:25:36 2016 +0300 +++ b/Lib/test/test_httplib.py Tue Jun 14 16:42:05 2016 +0200 @@ -314,6 +314,125 @@ conn.putheader(name, value) +class TransferEncodingTest(TestCase): + expected_body = b"It's just a flesh wound" + + def test_chunked(self): + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.send(self._make_body(), encode_chunked=True) + + body = self._parse_chunked(conn.sock.data) + self.assertEqual(body, self.expected_body) + + def test_explicit_headers(self): + # explicit chunked + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + # this shouldn't actually be automatically chunk encoded because the + # calling code has explicitly stated that it's taking care of it + conn.request( + 'POST', '/', self._make_body(), {'Transfer-Encoding': 'chunked'}) + + _, headers, body = self._parse_request(conn.sock.data) + self.assertNotIn('content-length', [k.lower() for k in headers.keys()]) + self.assertEqual(headers['Transfer-Encoding'], 'chunked') + self.assertEqual(body, self.expected_body) + + # explicit chunked, string body + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.request( + 'POST', '/', self.expected_body.decode('latin-1'), + {'Transfer-Encoding': 'chunked'}) + + _, headers, body = self._parse_request(conn.sock.data) + self.assertNotIn('content-length', [k.lower() for k in headers.keys()]) + self.assertEqual(headers['Transfer-Encoding'], 'chunked') + self.assertEqual(body, self.expected_body) + + # invalid ordering + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + with self.assertRaises(client.EncodingError): + conn.request( + 'POST', '/', self._make_body(), + {'Transfer-Encoding': 'chunked,gzip'}) + + # multiple chunk encodings found + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + with self.assertRaises(client.EncodingError): + conn.request( + 'POST', '/', self._make_body(), + {'Transfer-Encoding': 'chunked,gzip,chunked'}) + + def test_request(self): + for val in (False, True,): + conn = client.HTTPConnection('example.com') + conn.sock = FakeSocket(None) + conn.request( + 'POST', '/', self._make_body(empty_lines=val)) + + _, headers, body = self._parse_request(conn.sock.data) + body = self._parse_chunked(body) + self.assertEqual(body, self.expected_body) + + # Content-Length and Transfer-Encoding SHOULD not be sent in the + # same request + self.assertNotIn( + b'content-length', [h.lower() for h in headers.keys()]) + + def _make_body(self, empty_lines=False): + lines = self.expected_body.split(b' ') + for idx, line in enumerate(lines): + # for testing handling empty lines + if empty_lines and idx % 2: + yield b'' + if idx < len(lines) - 1: + yield line + b' ' + else: + yield line + + def _parse_request(self, data): + lines = data.split(b'\r\n') + request = lines[0] + headers = {} + n = 1 + while n < len(lines) and len(lines[n]) > 0: + key, val = lines[n].split(b':') + headers[key.decode('latin-1')] = val.decode('latin-1').strip() + n += 1 + + return request, headers, b'\r\n'.join(lines[n + 1:]) + + def _parse_chunked(self, data): + body = [] + trailers = {} + n = 0 + lines = data.split(b'\r\n') + # parse body + while True: + size, chunk = lines[n:n+2] + size = int(size, 16) + + if size == 0: + n += 1 + break + + self.assertEqual(size, len(chunk)) + body.append(chunk) + + n += 2 + # we /should/ hit the end chunk, but check against the size of + # lines so we're not stuck in an infinite loop should we get + # malformed data + if n > len(lines): + break + + return b''.join(body) + + class BasicTest(TestCase): def test_status_lines(self): # Test HTTP status lines @@ -1535,6 +1654,25 @@ message = client.parse_headers(f) return message, f + def test_list_body(self): + cases = ( + ([b'foo', b'bar'], b'foobar'), + ((b'foo', b'bar'), b'foobar'), + ((b'foo', 'bar'), b'foobar'), + ([b'foo', 'bar'], b'foobar'), + ) + for body, expected in cases: + with self.subTest(body): + self.conn = client.HTTPConnection('example.com') + self.conn.sock = self.sock = FakeSocket('') + + self.conn.request('PUT', '/url', body) + msg, f = self.get_headers_and_fp() + self.assertNotIn('Content-Type', msg) + self.assertIsNone(msg.get_charset()) + self.assertEqual(len(expected), int(msg.get('content-length'))) + self.assertEqual(expected, f.read()) + def test_manual_content_length(self): # Set an incorrect content-length so that we can verify that # it will not be over-ridden by the library. diff -r 2126e8cbc12f Lib/test/test_urllib2.py --- a/Lib/test/test_urllib2.py Tue Jun 14 15:25:36 2016 +0300 +++ b/Lib/test/test_urllib2.py Tue Jun 14 16:42:05 2016 +0200 @@ -2,11 +2,14 @@ from test import support from test import test_urllib +import sys import os import io import socket import array import sys +import tempfile +import subprocess import urllib.request # The proxy bypass method imported below has logic specific to the OSX @@ -335,7 +338,8 @@ else: self._tunnel_headers.clear() - def request(self, method, url, body=None, headers=None): + def request(self, method, url, body=None, headers=None, + encode_chunked=False): self.method = method self.selector = url if headers is not None: @@ -343,6 +347,7 @@ self.req_headers.sort() if body: self.data = body + self.encode_chunked = encode_chunked if self.raise_on_endheaders: raise OSError() @@ -908,7 +913,61 @@ self.assertEqual(req.unredirected_hdrs["Host"], "baz") self.assertEqual(req.unredirected_hdrs["Spam"], "foo") - # Check iterable body support + # A regular file - Content Length is calculated unless already set. + + file_no, file_path = tempfile.mkstemp() + os.write(file_no, b"Something\nSomething\nSomething\n") + os.close(file_no) + + for headers in {}, {"Content-Length": 30}: + f = open(file_path, "rb") + req = Request("http://example.com/", f, headers) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + f.close() + + os.unlink(file_path) + + # A file object - Content Length is calculated unless already set. + # (Note that there are some subtle differences to a regular + # file, that is why we are testing both cases.) + + file_obj = io.BytesIO() + file_obj.write(b"Something\nSomething\nSomething\n") + + for headers in {}, {"Content-Length": 30}: + file_obj.seek(0) + req = Request("http://example.com/", file_obj, headers) + newreq = h.do_request_(req) + self.assertEqual(int(newreq.get_header('Content-length')), 30) + + file_obj.close() + + # A file reading from a pipe. + # A pipe cannot be seek'ed. There is no way to determine the + # content length up front. Thus, do_request_() should fall + # back to Transfer-encoding chunked. + + cmd = r"print('Something\nSomething\nSomething')" + for headers in {}, {"Content-Length": 30}: + proc = subprocess.Popen([sys.executable, "-c", cmd], + stdout=subprocess.PIPE) + req = Request("http://example.com/", proc.stdout, headers) + newreq = h.do_request_(req) + if not headers: + self.assertEqual(newreq.get_header('Content-length'), None) + self.assertEqual(newreq.get_header('Transfer-encoding'), + 'chunked') + else: + self.assertEqual(int(newreq.get_header('Content-length')), 30) + # Drain the pipe and reap the child. + proc.stdout.read() + proc.stdout.close() + proc.wait() + + # Generic iterable. There is no way to determine the content + # length up front. Fall back to Transfer-encoding chunked. + def iterable_body(): yield b"one" yield b"two" @@ -916,30 +975,13 @@ for headers in {}, {"Content-Length": 11}: req = Request("http://example.com/", iterable_body(), headers) + newreq = h.do_request_(req) if not headers: - # Having an iterable body without a Content-Length should - # raise an exception - self.assertRaises(ValueError, h.do_request_, req) + self.assertEqual(newreq.get_header('Content-length'), None) + self.assertEqual(newreq.get_header('Transfer-encoding'), + 'chunked') else: - newreq = h.do_request_(req) - - # A file object. - # Test only Content-Length attribute of request. - - file_obj = io.BytesIO() - file_obj.write(b"Something\nSomething\nSomething\n") - - for headers in {}, {"Content-Length": 30}: - req = Request("http://example.com/", file_obj, headers) - if not headers: - # Having an iterable body without a Content-Length should - # raise an exception - self.assertRaises(ValueError, h.do_request_, req) - else: - newreq = h.do_request_(req) - self.assertEqual(int(newreq.get_header('Content-length')), 30) - - file_obj.close() + self.assertEqual(int(newreq.get_header('Content-length')), 11) # array.array Iterable - Content Length is calculated diff -r 2126e8cbc12f Lib/urllib/request.py --- a/Lib/urllib/request.py Tue Jun 14 15:25:36 2016 +0300 +++ b/Lib/urllib/request.py Tue Jun 14 16:42:05 2016 +0200 @@ -1235,6 +1235,11 @@ def set_http_debuglevel(self, level): self._debuglevel = level + def _get_content_length(self, request): + return http.client.HTTPConnection.get_content_length( + request.data, + request.get_method()) + def do_request_(self, request): host = request.host if not host: @@ -1250,17 +1255,15 @@ request.add_unredirected_header( 'Content-type', 'application/x-www-form-urlencoded') - if not request.has_header('Content-length'): - try: - mv = memoryview(data) - except TypeError: - if isinstance(data, collections.Iterable): - raise ValueError("Content-Length should be specified " - "for iterable data of type %r %r" % (type(data), - data)) + if (not request.has_header('Content-length') + and not request.has_header('Transfer-encoding')): + content_length = self._get_content_length(request) + if content_length is not None: + request.add_unredirected_header( + 'Content-length', str(content_length)) else: request.add_unredirected_header( - 'Content-length', '%d' % (len(mv) * mv.itemsize)) + 'Transfer-encoding', 'chunked') sel_host = host if request.has_proxy(): @@ -1316,7 +1319,8 @@ try: try: - h.request(req.get_method(), req.selector, req.data, headers) + h.request(req.get_method(), req.selector, req.data, headers, + encode_chunked=req.has_header('Transfer-encoding')) except OSError as err: # timeout error raise URLError(err) r = h.getresponse() diff -r 2126e8cbc12f Misc/NEWS --- a/Misc/NEWS Tue Jun 14 15:25:36 2016 +0300 +++ b/Misc/NEWS Tue Jun 14 16:42:05 2016 +0200 @@ -26,6 +26,16 @@ locale encoding, and fix get_begidx() and get_endidx() to return code point indexes. +- Issue #12319: Chunked transfer encoding support added to + http.client.HTTPConnection requests. + urllib.request.AbstractHTTPHandler does not enforce a Content-Length + header any more. If a HTTP request has a non empty body, but no + Content-Length header, and the content length cannot be determined + up front, rather then throwing an error, this class now falls back + to use chunked transfer encoding. As a side effect, this class + makes no more assumptions on the type of the body, so everything + supported by http.client is now allowed. + IDLE ----