classification
Title: [http.client] HTTPConnection.putrequest not support "chunked" Transfer-Encodings to send data
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Rotkraut, demian.brecht, harobed, orsenthil, petri.lehtinen, piotr.dobrogost, pitrou, vadmium, whitemice
Priority: normal Keywords: needs review, patch

Created on 2011-06-12 10:47 by harobed, last changed 2015-04-01 23:56 by demian.brecht.

Files
File name Uploaded Description Edit
chunkedhttp.py Rotkraut, 2014-08-28 08:45 A custom module implementing upload with chunked transfer encoding for urllib
issue12319.patch demian.brecht, 2015-03-06 00:34 review
issue12319_1.patch demian.brecht, 2015-03-06 00:39 Removing unused imports review
issue12319_2.patch demian.brecht, 2015-03-17 16:56 review
issue12319_3.patch demian.brecht, 2015-03-24 16:25 review
issue12319_4.patch demian.brecht, 2015-03-31 23:31 review
issue12319_5.patch demian.brecht, 2015-04-01 23:56 review
Messages (23)
msg138203 - (view) Author: harobed (harobed) Date: 2011-06-12 10:47
Hi,

HTTPConnection.putrequest not support "chunked" Transfer-Encodings to send data.

Exemple, I can't do PUT request with chunk transfert.

Regards,
Stephane
msg138242 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-06-13 13:01
What's the use case? Do you have an iterable that yields data whose size is unknown?

AFAIK, most web servers don't even support chunked uploads.

(Removing Python 2.7 from versions as this is clearly a feature request.)
msg138258 - (view) Author: harobed (harobed) Date: 2011-06-13 15:35
I use http.client in WebDAV client.

Mac OS X Finder WebDAV client perform all his request in "chunk" mode : PUT and GET.

Here, I use http.client to simulate Mac OS X Finder WebDAV client.

Regards,
Stephane
msg138296 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-06-14 06:15
harobed wrote:
> I use http.client in WebDAV client.
>
> Mac OS X Finder WebDAV client perform all his request in "chunk" mode : PUT and GET.
>
> Here, I use http.client to simulate Mac OS X Finder WebDAV client.

Now I'm confused. Per the HTTP specification, GET requests don't have
a body, so "Transfer-Encoding: chunked" doesn't apply to them.

Are you sure you don't confuse with the response that the server
sends? In responses, "Transfer-Encoding: chunked" is very common.
msg138357 - (view) Author: harobed (harobed) Date: 2011-06-15 07:39
> Now I'm confused. Per the HTTP specification, GET requests don't have
a body, so "Transfer-Encoding: chunked" doesn't apply to them.

> Are you sure you don't confuse with the response that the server
sends? In responses, "Transfer-Encoding: chunked" is very common.

Sorry, yes GET requests have "Transfer-Encoding: chunked" in server response.
PUT requests can send body data in transfer-encoding chunked mode.
msg138691 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-06-20 08:13
We had support for chunked transfer encoding for POST method recently, which is exposed via urllib2 wrapper function. PUT is not exposed via urllib2 and users should use httplib. This feature of chunked transfer can be added to PUT by taking the body of the message as iterable.
msg171268 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-25 13:14
> We had support for chunked transfer encoding for POST method recently, 
> which is exposed via urllib2 wrapper function.

I couldn't find what you're talking about.
If I look at AbstractHTTPHandler.do_request_, it actually mandates a Content-Length header for POST data (hence no chunked encoding).
msg226012 - (view) Author: Rolf Krahl (Rotkraut) Date: 2014-08-28 08:45
I'd like to support the request.  I have a use case where I definitely need this feature: I maintain a Python client for a scientific metadata catalogue, see [1] for details.  The client also features the upload of the data files.  The files may come in as a data stream from another process, so my client takes a file like object as input.  The files may be large (several GB), so buffering them is not an option, they must get streamed to the server as they come in.  Therefore, there is have no way to know the Content-length of the upload beforehand.

I implemented chunked transfer encoding in a custom module that monkey patches the library, see the attached file.  This works fine, but of course it's an awkward hack as I must meddle deeply into the internals of urllib and http.client to get this working.  This module is tested to work with Python 2.7, 3.1, 3.2, 3.3, and 3.4 (for Python 3 you need to pass it through 2to3 first).  I really would like to see this feature in the standard library in order to get rid of this hack in my package.  I would happy to transform my module into a patch to the library if such a patch would have a chance to get accepted.

[1]: https://code.google.com/p/icatproject/wiki/PythonIcat
msg226018 - (view) Author: Piotr Dobrogost (piotr.dobrogost) Date: 2014-08-28 11:28
@Rotkraut

The truth is http in stdlib is dead.
Your best option is to use 3rd party libs like requests or urllib3.
Authors of these libs plan to get rid of httplib entirely; see "Moving away from httplib" (https://github.com/shazow/urllib3/issues/58)
msg226024 - (view) Author: Rolf Krahl (Rotkraut) Date: 2014-08-28 13:54
Thanks for the notice!  As far as I read from the link you cite, getting rid of the current httplib in urllib3 is planned but far from being done.  Furthermore, I don't like packages with too many 3rd party dependencies.  Since my package is working fine with the current standard lib, even though using an ugly hack in one place, I'd consider switching to urllib3 as soon as the latter makes it into the standard lib.

I still believe that adding chunked transfer encoding to http.client and urllib in the current standard lib would only require a rather small change that can easily be done such that the lib remains fully compatible with existing code.  Still waiting for feedback if such a patch is welcome.
msg236024 - (view) Author: Martin Panter (vadmium) * Date: 2015-02-15 05:58
One interesting question is how to convey data to the chunked encoder. There are two sets of options in my mind, a pull interface:

* iterable: seems to be the popular way amoung commenters here
* file reader object: encoder calls into stream’s read() method

and a push interface:

* chunked encoder is a file writer object: user calls encoder’s write() and close() methods. This would suit APIs like saxutils.XMLGenerator and TextIOWrapper.
* chunked encoder has a “feed parser” interface, codecs.IncrementalEncoder interface, or something else.

The advantage of the push interface is that you could fairly easily feed data from an iterable or file reader into it simply by just doing shutil.copyfileobj() or equivalent. But to adapt the pull interface to a push interface would require “asyncio” support or a separate thread or something to invert the flow of control. So I think building the encoder with a push interface would be best. Rolf’s ChunkedHTTPConnectionMixin class appears to only support the pull interface (iterables and stream readers).

I would welcome support for chunked uploading in Python’s “http.client” module, especially with push or stream writer support. I don’t think overwriting _send_request should be necessary; just call putrequest(), putheader() etc manually, and then call send() for each chunk. Perhaps there is scope for sharing the code with the “http.server” module (for encoding chunked responses).
msg236037 - (view) Author: Rolf Krahl (Rotkraut) Date: 2015-02-15 13:10
The design goal for my implementation was compatibility.  My version can be used as a drop in replacement for the existing urllib's HTTPHandler.  The only thing that need to be changed in the calling code is that it must call build_opener() to select the chunked handler in the place of the default HTTPHandler.  After this, the calling code can use the returned opener in the very same way as usual.

I guess, switching to a push interface would require major changes in the calling code.  

In principle, you could certainly support both schemes at the same time: you could change the internal design to a push interface and than wrap this by a pull interface for the compatibility with existing code.  But I'm not sure whether this would be worth the effort.  If, as Piotr suggests, the current urllib is going to be replaced by urllib3, then I guess, its questionable if it makes sense to add  major design changes that are incompatible with existing code to the current standard lib.
msg237309 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:22
I've attached a patch that implements full Transfer-Encoding support for requests as specified in RFC 7230.
msg237312 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:34
I hit "submit" a little too soon.

The intention of the patch is to adhere to all aspects of Transfer-Encoding as specified in the RFC and to make best guesses as to encoding that should be used based on the data type of the given body.

This will break backwards compatibility for cases where users are manually chunking the request bodies prior to passing them in and explicitly setting the Transfer-Encoding header. Additionally, if Transfer-Encoding was previously specified, but not chunked, the patch will automatically chunk the body.

Otherwise, the patch should only be additive.
msg237313 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:37
Also note that this patch includes the changes in #23350 as it's directly relevant.
msg237427 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-07 07:46
FWIW, so far I've tested this change against:

cherrypy 3.6.0
uwsgi 2.0.9 (--http-raw-body)
nginx 1.6.2 (chunked_transfer_encoding on, proxy_buffering off;) + uwsgi 2.0.9 (--http-raw-body)

The chunked body works as expected. Unfortunately, all implementations seem to be ignorant of the trailer part. So it seems that although RFC-compliant (and I can definitely see the use case for it), they trailer implementation may not be overly practical. I still think that it's worthwhile keeping it, but perhaps adding a note that it may not be supported at this point.

Relevant gists: https://gist.github.com/demianbrecht/3fd60994eceeb3da8f13
msg237509 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-08 04:17
After sleeping on this, I think that the best route to go would be to drop the trailer implementation (if it's not practical it doesn't belong in the standard library).

Also, to better preserve backwards compatibility it may be better to circumvent the automatic chunking if transfer-encoding headers are present in the request call. That way, no changes would need to be made to existing code that already supports it at a higher level.
msg238314 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-17 16:56
Updated patch changes the following:

+ Removes support for trailers in requests as they're not supported
+ If Transfer-Encoding is explicitly set by the client, it's assumed that the caller will handle all encoding (backwards compatibility)
+ Fixed a bug where chunk size was being sent as decimal instead of hex
msg239119 - (view) Author: Martin Panter (vadmium) * Date: 2015-03-24 13:18
I left a few comments on Reitveld, mainly about the documentation and API design.

However I understand Rolf specifically wanted chunked encoding to work with the existing urlopen() framework, at least after constructing a separate opener object. I think that should be practical with the existing HTTPConnection implementation. Here is some pseudocode of how I might write a urlopen() handler class, and an encoder class that should be usable for both clients and servers:

class ChunkedHandler(...):
    def http_request(...):
        # Like AbstractHTTPHandler, but don’t force Content-Length
    
    def default_open(...):
        # Like AbstractHTTPHandler, but instead of calling h.request():
        encoder = ChunkedEncoder(h.send)
        h.putrequest(req.get_method(), req.selector)
        for item in headers:
            h.putheader(*item)
        h.putheader("Transfer-Encoding", encoder.encoding)
        h.endheaders()
        shutil.copyfileobj(req.data, writer)
        encoder.close()

class ChunkedEncoder(io.BufferedIOBase):
    # Hook output() up to either http.client.HTTPConnection.send()
    # or http.server.BaseHTTPRequestHandler.wfile.write()
    
    encoding = "chunked"
    
    def write(self, b):
        self.output("{:X}\r\n".format(len(b)).encode("ascii"))
        self.output(b)
        self.output(b"\r\n")
    
    def close(self):
        self.output(b"0\r\n\r\n")
msg239151 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-24 16:26
Thanks for the review Martin.

> However I understand Rolf specifically wanted chunked encoding to work with the existing urlopen() framework, at least after constructing a separate opener object. I think that should be practical with the existing HTTPConnection implementation.

The original issue was that http.client doesn't support chunked encoding. With this patch, chunked encoding should more or less come for free with urlopen. There's absolutely no reason as to why HTTPConnection should not support transfer encoding out of the box given it's part of the HTTP1.1 spec. I do understand that there are some modifications needed in urllib.request in order to support the changes here, but I didn't include those in the patch as to not conflate the patch. I think that it's also reasonable to open a new issue to address the feature in urllib.request rather than piggy-backing on this one.
msg239769 - (view) Author: Martin Panter (vadmium) * Date: 2015-04-01 04:27
Perhaps you should make a table of some potential body object types, and figure out what the behaviour should be for request() with and without relevant headers, and endheaders() and send() with and without encode_chunked=True:

* Add/don’t add Content-Length/Transfer-Encoding
* Send with/without chunked encoding
* Raise exception
* Not supported or undefined behaviour

Potential body types:

* None with GET/POST request
* bytes()
* Latin-1/non Latin-1 str()
* BytesIO/StringIO
* Ordinary binary/Latin-1/other file object
* File object reading a special file like a pipe (st_size == 0)
* File object wrapping a pipe or similar that does not support fileno() (ESPIPE)
* Custom file object not supporting fileno() nor seeking
* File object at non-zero offset
* GzipFile object, where fileno() corresponds to the compressed size
* GzipFile not supporting fileno(), where seeking is possible but expensive
* Iterator yielding bytes() and/or strings
* Generator
* File object considered as an iterable of lines
* List/tuple of bytes/text
* Other sequences of bytes/text
* Other iterables of bytes/text, e.g. set(), OrderedDict.values()

This list could go on and on. I would rather have a couple of simple rules, or explicit APIs for the various modes so you don’t have to guess which mode your particular body type will trigger.
msg239771 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-04-01 05:24
Agreed. However, I'm wondering if that should belong in a new issue geared towards further clarifying behaviour of request body types. The patch introduces the behaviour this specific issue was looking, with the largest change being that iterators may now result in chunked transfer encoding with the data currently handled by the library. I'd rather move forward with incremental improvements rather than keep building on each issue before the work's merged (there's still a /good/ amount of work to be done in this module).
msg239793 - (view) Author: Martin Panter (vadmium) * Date: 2015-04-01 12:11
The incremental improvement thing sounds good. Here are some things which I think are orthogonal to sensible chunked encoding:

* Automagic seeking to determine Content-Length
* Setting Content-Length for iterables that are neither strings, iterators nor files (Issue 23350)
* Latin-1 encoding of iterated items
History
Date User Action Args
2015-04-01 23:56:39demian.brechtsetfiles: + issue12319_5.patch
2015-04-01 12:11:15vadmiumsetmessages: + msg239793
2015-04-01 05:24:32demian.brechtsetmessages: + msg239771
2015-04-01 04:27:08vadmiumsetmessages: + msg239769
2015-03-31 23:31:11demian.brechtsetfiles: + issue12319_4.patch
2015-03-24 16:26:07demian.brechtsetmessages: + msg239151
2015-03-24 16:25:12demian.brechtsetfiles: + issue12319_3.patch
2015-03-24 13:18:37vadmiumsetmessages: + msg239119
2015-03-17 16:56:49demian.brechtsetfiles: + issue12319_2.patch

messages: + msg238314
2015-03-08 04:17:56demian.brechtsetmessages: + msg237509
2015-03-07 07:46:32demian.brechtsetmessages: + msg237427
2015-03-06 00:39:54demian.brechtsetfiles: + issue12319_1.patch
2015-03-06 00:37:14demian.brechtsetmessages: + msg237313
2015-03-06 00:34:49demian.brechtsetversions: + Python 3.5, - Python 3.3
2015-03-06 00:34:08demian.brechtsetkeywords: + needs review, patch
files: + issue12319.patch
messages: + msg237312

stage: needs patch -> patch review
2015-03-06 00:22:29demian.brechtsetmessages: + msg237309
2015-02-15 13:10:20Rotkrautsetmessages: + msg236037
2015-02-15 05:58:27vadmiumsetmessages: + msg236024
2014-08-29 22:23:41vadmiumsetnosy: + vadmium
2014-08-28 13:54:29Rotkrautsetmessages: + msg226024
2014-08-28 11:28:08piotr.dobrogostsetmessages: + msg226018
2014-08-28 08:45:48Rotkrautsetfiles: + chunkedhttp.py
nosy: + Rotkraut
messages: + msg226012

2014-07-26 01:15:14demian.brechtsetnosy: + demian.brecht
2014-07-26 00:10:59whitemicesetnosy: + whitemice
2012-10-10 21:51:32piotr.dobrogostsetnosy: + piotr.dobrogost
2012-09-25 13:14:48pitrousetnosy: + pitrou
messages: + msg171268
2011-06-20 08:13:35orsenthilsetassignee: orsenthil
messages: + msg138691
stage: needs patch
2011-06-15 07:39:49harobedsetmessages: + msg138357
2011-06-14 13:03:49pitrousetnosy: + orsenthil
2011-06-14 06:15:57petri.lehtinensetmessages: + msg138296
2011-06-13 15:35:20harobedsetmessages: + msg138258
2011-06-13 13:01:35petri.lehtinensetversions: - Python 2.7
nosy: + petri.lehtinen

messages: + msg138242

type: enhancement
2011-06-12 10:47:13harobedcreate