classification
Title: [http.client] HTTPConnection.request not support "chunked" Transfer-Encoding to send data
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Rotkraut, demian.brecht, harobed, martin.panter, orsenthil, petri.lehtinen, piotr.dobrogost, pitrou, whitemice
Priority: normal Keywords: needs review, patch

Created on 2011-06-12 10:47 by harobed, last changed 2015-05-23 01:38 by martin.panter.

Files
File name Uploaded Description Edit
chunkedhttp.py Rotkraut, 2014-08-28 08:45 A custom module implementing upload with chunked transfer encoding for urllib
issue12319.patch demian.brecht, 2015-03-06 00:34 review
issue12319_1.patch demian.brecht, 2015-03-06 00:39 Removing unused imports review
issue12319_2.patch demian.brecht, 2015-03-17 16:56 review
issue12319_3.patch demian.brecht, 2015-03-24 16:25 review
issue12319_4.patch demian.brecht, 2015-03-31 23:31 review
issue12319_5.patch demian.brecht, 2015-04-01 23:56 review
issue12319_6.patch demian.brecht, 2015-05-21 05:55 review
chunkedhttp-2.py Rotkraut, 2015-05-22 08:48 Illustration of the situation after applying issue12319_6.patch from user's point of view
Messages (34)
msg138203 - (view) Author: harobed (harobed) Date: 2011-06-12 10:47
Hi,

HTTPConnection.putrequest not support "chunked" Transfer-Encodings to send data.

Exemple, I can't do PUT request with chunk transfert.

Regards,
Stephane
msg138242 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-06-13 13:01
What's the use case? Do you have an iterable that yields data whose size is unknown?

AFAIK, most web servers don't even support chunked uploads.

(Removing Python 2.7 from versions as this is clearly a feature request.)
msg138258 - (view) Author: harobed (harobed) Date: 2011-06-13 15:35
I use http.client in WebDAV client.

Mac OS X Finder WebDAV client perform all his request in "chunk" mode : PUT and GET.

Here, I use http.client to simulate Mac OS X Finder WebDAV client.

Regards,
Stephane
msg138296 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-06-14 06:15
harobed wrote:
> I use http.client in WebDAV client.
>
> Mac OS X Finder WebDAV client perform all his request in "chunk" mode : PUT and GET.
>
> Here, I use http.client to simulate Mac OS X Finder WebDAV client.

Now I'm confused. Per the HTTP specification, GET requests don't have
a body, so "Transfer-Encoding: chunked" doesn't apply to them.

Are you sure you don't confuse with the response that the server
sends? In responses, "Transfer-Encoding: chunked" is very common.
msg138357 - (view) Author: harobed (harobed) Date: 2011-06-15 07:39
> Now I'm confused. Per the HTTP specification, GET requests don't have
a body, so "Transfer-Encoding: chunked" doesn't apply to them.

> Are you sure you don't confuse with the response that the server
sends? In responses, "Transfer-Encoding: chunked" is very common.

Sorry, yes GET requests have "Transfer-Encoding: chunked" in server response.
PUT requests can send body data in transfer-encoding chunked mode.
msg138691 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-06-20 08:13
We had support for chunked transfer encoding for POST method recently, which is exposed via urllib2 wrapper function. PUT is not exposed via urllib2 and users should use httplib. This feature of chunked transfer can be added to PUT by taking the body of the message as iterable.
msg171268 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-25 13:14
> We had support for chunked transfer encoding for POST method recently, 
> which is exposed via urllib2 wrapper function.

I couldn't find what you're talking about.
If I look at AbstractHTTPHandler.do_request_, it actually mandates a Content-Length header for POST data (hence no chunked encoding).
msg226012 - (view) Author: Rolf Krahl (Rotkraut) Date: 2014-08-28 08:45
I'd like to support the request.  I have a use case where I definitely need this feature: I maintain a Python client for a scientific metadata catalogue, see [1] for details.  The client also features the upload of the data files.  The files may come in as a data stream from another process, so my client takes a file like object as input.  The files may be large (several GB), so buffering them is not an option, they must get streamed to the server as they come in.  Therefore, there is have no way to know the Content-length of the upload beforehand.

I implemented chunked transfer encoding in a custom module that monkey patches the library, see the attached file.  This works fine, but of course it's an awkward hack as I must meddle deeply into the internals of urllib and http.client to get this working.  This module is tested to work with Python 2.7, 3.1, 3.2, 3.3, and 3.4 (for Python 3 you need to pass it through 2to3 first).  I really would like to see this feature in the standard library in order to get rid of this hack in my package.  I would happy to transform my module into a patch to the library if such a patch would have a chance to get accepted.

[1]: https://code.google.com/p/icatproject/wiki/PythonIcat
msg226018 - (view) Author: Piotr Dobrogost (piotr.dobrogost) Date: 2014-08-28 11:28
@Rotkraut

The truth is http in stdlib is dead.
Your best option is to use 3rd party libs like requests or urllib3.
Authors of these libs plan to get rid of httplib entirely; see "Moving away from httplib" (https://github.com/shazow/urllib3/issues/58)
msg226024 - (view) Author: Rolf Krahl (Rotkraut) Date: 2014-08-28 13:54
Thanks for the notice!  As far as I read from the link you cite, getting rid of the current httplib in urllib3 is planned but far from being done.  Furthermore, I don't like packages with too many 3rd party dependencies.  Since my package is working fine with the current standard lib, even though using an ugly hack in one place, I'd consider switching to urllib3 as soon as the latter makes it into the standard lib.

I still believe that adding chunked transfer encoding to http.client and urllib in the current standard lib would only require a rather small change that can easily be done such that the lib remains fully compatible with existing code.  Still waiting for feedback if such a patch is welcome.
msg236024 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-15 05:58
One interesting question is how to convey data to the chunked encoder. There are two sets of options in my mind, a pull interface:

* iterable: seems to be the popular way amoung commenters here
* file reader object: encoder calls into stream’s read() method

and a push interface:

* chunked encoder is a file writer object: user calls encoder’s write() and close() methods. This would suit APIs like saxutils.XMLGenerator and TextIOWrapper.
* chunked encoder has a “feed parser” interface, codecs.IncrementalEncoder interface, or something else.

The advantage of the push interface is that you could fairly easily feed data from an iterable or file reader into it simply by just doing shutil.copyfileobj() or equivalent. But to adapt the pull interface to a push interface would require “asyncio” support or a separate thread or something to invert the flow of control. So I think building the encoder with a push interface would be best. Rolf’s ChunkedHTTPConnectionMixin class appears to only support the pull interface (iterables and stream readers).

I would welcome support for chunked uploading in Python’s “http.client” module, especially with push or stream writer support. I don’t think overwriting _send_request should be necessary; just call putrequest(), putheader() etc manually, and then call send() for each chunk. Perhaps there is scope for sharing the code with the “http.server” module (for encoding chunked responses).
msg236037 - (view) Author: Rolf Krahl (Rotkraut) Date: 2015-02-15 13:10
The design goal for my implementation was compatibility.  My version can be used as a drop in replacement for the existing urllib's HTTPHandler.  The only thing that need to be changed in the calling code is that it must call build_opener() to select the chunked handler in the place of the default HTTPHandler.  After this, the calling code can use the returned opener in the very same way as usual.

I guess, switching to a push interface would require major changes in the calling code.  

In principle, you could certainly support both schemes at the same time: you could change the internal design to a push interface and than wrap this by a pull interface for the compatibility with existing code.  But I'm not sure whether this would be worth the effort.  If, as Piotr suggests, the current urllib is going to be replaced by urllib3, then I guess, its questionable if it makes sense to add  major design changes that are incompatible with existing code to the current standard lib.
msg237309 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:22
I've attached a patch that implements full Transfer-Encoding support for requests as specified in RFC 7230.
msg237312 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:34
I hit "submit" a little too soon.

The intention of the patch is to adhere to all aspects of Transfer-Encoding as specified in the RFC and to make best guesses as to encoding that should be used based on the data type of the given body.

This will break backwards compatibility for cases where users are manually chunking the request bodies prior to passing them in and explicitly setting the Transfer-Encoding header. Additionally, if Transfer-Encoding was previously specified, but not chunked, the patch will automatically chunk the body.

Otherwise, the patch should only be additive.
msg237313 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-06 00:37
Also note that this patch includes the changes in #23350 as it's directly relevant.
msg237427 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-07 07:46
FWIW, so far I've tested this change against:

cherrypy 3.6.0
uwsgi 2.0.9 (--http-raw-body)
nginx 1.6.2 (chunked_transfer_encoding on, proxy_buffering off;) + uwsgi 2.0.9 (--http-raw-body)

The chunked body works as expected. Unfortunately, all implementations seem to be ignorant of the trailer part. So it seems that although RFC-compliant (and I can definitely see the use case for it), they trailer implementation may not be overly practical. I still think that it's worthwhile keeping it, but perhaps adding a note that it may not be supported at this point.

Relevant gists: https://gist.github.com/demianbrecht/3fd60994eceeb3da8f13
msg237509 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-08 04:17
After sleeping on this, I think that the best route to go would be to drop the trailer implementation (if it's not practical it doesn't belong in the standard library).

Also, to better preserve backwards compatibility it may be better to circumvent the automatic chunking if transfer-encoding headers are present in the request call. That way, no changes would need to be made to existing code that already supports it at a higher level.
msg238314 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-17 16:56
Updated patch changes the following:

+ Removes support for trailers in requests as they're not supported
+ If Transfer-Encoding is explicitly set by the client, it's assumed that the caller will handle all encoding (backwards compatibility)
+ Fixed a bug where chunk size was being sent as decimal instead of hex
msg239119 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-24 13:18
I left a few comments on Reitveld, mainly about the documentation and API design.

However I understand Rolf specifically wanted chunked encoding to work with the existing urlopen() framework, at least after constructing a separate opener object. I think that should be practical with the existing HTTPConnection implementation. Here is some pseudocode of how I might write a urlopen() handler class, and an encoder class that should be usable for both clients and servers:

class ChunkedHandler(...):
    def http_request(...):
        # Like AbstractHTTPHandler, but don’t force Content-Length
    
    def default_open(...):
        # Like AbstractHTTPHandler, but instead of calling h.request():
        encoder = ChunkedEncoder(h.send)
        h.putrequest(req.get_method(), req.selector)
        for item in headers:
            h.putheader(*item)
        h.putheader("Transfer-Encoding", encoder.encoding)
        h.endheaders()
        shutil.copyfileobj(req.data, writer)
        encoder.close()

class ChunkedEncoder(io.BufferedIOBase):
    # Hook output() up to either http.client.HTTPConnection.send()
    # or http.server.BaseHTTPRequestHandler.wfile.write()
    
    encoding = "chunked"
    
    def write(self, b):
        self.output("{:X}\r\n".format(len(b)).encode("ascii"))
        self.output(b)
        self.output(b"\r\n")
    
    def close(self):
        self.output(b"0\r\n\r\n")
msg239151 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-24 16:26
Thanks for the review Martin.

> However I understand Rolf specifically wanted chunked encoding to work with the existing urlopen() framework, at least after constructing a separate opener object. I think that should be practical with the existing HTTPConnection implementation.

The original issue was that http.client doesn't support chunked encoding. With this patch, chunked encoding should more or less come for free with urlopen. There's absolutely no reason as to why HTTPConnection should not support transfer encoding out of the box given it's part of the HTTP1.1 spec. I do understand that there are some modifications needed in urllib.request in order to support the changes here, but I didn't include those in the patch as to not conflate the patch. I think that it's also reasonable to open a new issue to address the feature in urllib.request rather than piggy-backing on this one.
msg239769 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-01 04:27
Perhaps you should make a table of some potential body object types, and figure out what the behaviour should be for request() with and without relevant headers, and endheaders() and send() with and without encode_chunked=True:

* Add/don’t add Content-Length/Transfer-Encoding
* Send with/without chunked encoding
* Raise exception
* Not supported or undefined behaviour

Potential body types:

* None with GET/POST request
* bytes()
* Latin-1/non Latin-1 str()
* BytesIO/StringIO
* Ordinary binary/Latin-1/other file object
* File object reading a special file like a pipe (st_size == 0)
* File object wrapping a pipe or similar that does not support fileno() (ESPIPE)
* Custom file object not supporting fileno() nor seeking
* File object at non-zero offset
* GzipFile object, where fileno() corresponds to the compressed size
* GzipFile not supporting fileno(), where seeking is possible but expensive
* Iterator yielding bytes() and/or strings
* Generator
* File object considered as an iterable of lines
* List/tuple of bytes/text
* Other sequences of bytes/text
* Other iterables of bytes/text, e.g. set(), OrderedDict.values()

This list could go on and on. I would rather have a couple of simple rules, or explicit APIs for the various modes so you don’t have to guess which mode your particular body type will trigger.
msg239771 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-04-01 05:24
Agreed. However, I'm wondering if that should belong in a new issue geared towards further clarifying behaviour of request body types. The patch introduces the behaviour this specific issue was looking, with the largest change being that iterators may now result in chunked transfer encoding with the data currently handled by the library. I'd rather move forward with incremental improvements rather than keep building on each issue before the work's merged (there's still a /good/ amount of work to be done in this module).
msg239793 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-01 12:11
The incremental improvement thing sounds good. Here are some things which I think are orthogonal to sensible chunked encoding:

* Automagic seeking to determine Content-Length
* Setting Content-Length for iterables that are neither strings, iterators nor files (Issue 23350)
* Latin-1 encoding of iterated items
msg243518 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-18 18:53
What's the status on this one? It looks like some review comments need addressing.
msg243601 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-19 15:43
> What's the status on this one? It looks like some review comments need addressing.

That's about it. Unfortunately I've been pretty tied up over the last month and a bit. I'll try to get to hopefully closing this out over the next few days.
msg243730 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-21 05:55
Latest patch should address all outstanding comments.
msg243731 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-21 05:56
BTW, thanks for the reviews Martin and the nudge Antoine!
msg243747 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-21 11:21
I left some new comments.

However I remain concerned at how complicated and overloaded the API is becoming. It certainly makes it hard to review for correctness. I could easily have missed some corner case that is broken by the changes. There are a lot of odd body objects apparently permitted, e.g. GzipFile objects can always seek to the end but may not be able to go backwards, mmap() objects are both bytes-like and file-like.
msg243751 - (view) Author: Rolf Krahl (Rotkraut) Date: 2015-05-21 13:59
Hi again,

first of all, sorry for not contributing to the discussion for such a long time.  I was quite busy lately.

I tested the patch with Python 3.5.0a3.  It works nicely for my use case.  Thanks a lot!

I have one issue though: urllib's HTTPHandler and HTTPSHandler still try to enforce a Content-length to be set (by AbstractHTTPHandler.do_request_()).  But for chunked transfer encoding, the Content-length must not be set.  Using to this patch, HTTPConnection also checks for the Content-length in _send_request() and sets it if missing.  AFAICS, HTTPConnection now does a far better job checking this then HTTPHandler and - most importantly - does it in a way that is compatible with chunked transfer encoding.  So, I would suggest, that there is no need for HTTPHandler to care about the content length and that it should just leave this header alone.

E.g., I suggest that the "if not request.has_header('Content-length'): [...]" statement should completely be removed from AbstractHTTPHandler.do_request_() in urllib/request.py.
msg243762 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-21 16:30
Thanks for the report Rolf. I'll look into your suggestion for this patch.

Antoine: Given beta is slated for the 24th and Martin's legitimate concerns, I think it might be a little hasty to get this in before feature freeze. Knocking it back to 3.6 (obviously feel free to change if you think it should remain for any reason).
msg243800 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-22 02:43
This bug and Demian’s patch were originally about changing the low-level http.client.HTTPConnection class. I suggest opening a separate issue for changing urllib.request.urlopen() or similar, to avoid confusion.

It should actually be possible to support chunking in urlopen() without any new support from HTTPConnection; just use the lower-level putrequest(), putheader(), endheader(), and send() methods. However, consideration should be given to how it interacts with handlers like HTTPBasicAuthHandler that resend POST data after getting 401 Unauthorized (similar scenario to Issue 5038).
msg243823 - (view) Author: Rolf Krahl (Rotkraut) Date: 2015-05-22 08:48
I disagree.  I believe that the suggested modification of AbstractHTTPHandler.do_request_() belongs into this change set for the following reasons:

1. "This module [http.client] defines classes which implement the client side of the HTTP and HTTPS protocols.  It is normally not used directly — the module urllib.request uses it to handle URLs that use HTTP and HTTPS."  Quote from the http.client documentation.  urllib.request is the high level API for HTTP requests.  Both modules must fit together.  Since urllib's HTTPHandler directly calls HTTPConnection, it can and should rely on the abilities of HTTPConnection.

2. The code in AbstractHTTPHandler is based on the assumption that each HTTP request having a non empty body must have a Content-length header set.  The truth is that a HTTP request must either have a Content-length header or use chunked transfer encoding (and then must not have a Content-length header).  As long as the underlying low level module did not support chunked transfer encoding anyway, this assumption might have been acceptable.  Now that this change set introduces support for chunked transfer encoding, this assumption is plain wrong and the resulting code just faulty.

3. This change set introduces a sophisticated determination of the correct content length, covering several different cases, including file like objects and iterables.  There is no need any more for the high level API to care about the content length, if this is already done in the low level method.  But even worse, all the efforts of HTTPConnection to determine the proper content length is essentially overridden by the rather blunt method in the high level API that get priority and that essentially insists the body to be a buffer like object.  This is strange.

4. The very purpose of this change set is to implement chunked transfer encoding.  But this is essentially disabled by a high level API that insists a Content-length header to be set.  This is plain silly.

Just to illustrate the current situation, I attach the modified version of my old chunkedhttp.py module, adapted to Demian's patch that I have used to test it.  It shows how a user would need to monkey patch the high level API in order to be able to use the features that is implemented by this change in the low level module.


I wouldn't mind to file another issue against urllib.request.HTTPHandler if this makes things easier.  But what I really would like to avoid, is to have any Python version in the wild that has this current issue fixed, but HTTPHandler still broken.  Having to support a wide range of different Python versions is difficult enough for third party library authors like me.  Adding a switch to distinguish between 3.6 (I can use the standard lib right away) and older (I need to replace it be my own old chunkedhttp implementation) is ok.  But having to support a third case (the low level HTTPConnection module would work, but I need to monkey patch the high level API in order to be able to use it) in-between would make things awkward.  That's why I would prefer to see the fix for HTTPHandler in the same change set.
msg243837 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-05-22 16:27
FWIW, I was intending to address the issues Rolf raised. Also, I agree that a patch shouldn't knowingly have negative knock-on effects to dependents.
msg243878 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-23 01:38
Okay perhaps a new issue at this stage isn’t the best idea, especially if you want both modules updated at the same time.

With the urlopen() API, it currently has no explicit support for file objects. So you either have to make sure they get treated like any other iterable, or add in explicit support.
History
Date User Action Args
2015-05-23 01:38:36martin.pantersetmessages: + msg243878
2015-05-22 16:27:23demian.brechtsetmessages: + msg243837
2015-05-22 08:48:48Rotkrautsetfiles: + chunkedhttp-2.py

messages: + msg243823
2015-05-22 02:43:11martin.pantersetmessages: + msg243800
title: [http.client] HTTPConnection.putrequest not support "chunked" Transfer-Encodings to send data -> [http.client] HTTPConnection.request not support "chunked" Transfer-Encoding to send data
2015-05-21 16:30:41demian.brechtsetmessages: + msg243762
versions: + Python 3.6, - Python 3.5
2015-05-21 13:59:03Rotkrautsetmessages: + msg243751
2015-05-21 11:21:07martin.pantersetmessages: + msg243747
2015-05-21 05:56:25demian.brechtsetmessages: + msg243731
2015-05-21 05:55:49demian.brechtsetfiles: + issue12319_6.patch

messages: + msg243730
2015-05-19 15:43:01demian.brechtsetmessages: + msg243601
2015-05-18 18:53:16pitrousetmessages: + msg243518
2015-04-01 23:56:39demian.brechtsetfiles: + issue12319_5.patch
2015-04-01 12:11:15martin.pantersetmessages: + msg239793
2015-04-01 05:24:32demian.brechtsetmessages: + msg239771
2015-04-01 04:27:08martin.pantersetmessages: + msg239769
2015-03-31 23:31:11demian.brechtsetfiles: + issue12319_4.patch
2015-03-24 16:26:07demian.brechtsetmessages: + msg239151
2015-03-24 16:25:12demian.brechtsetfiles: + issue12319_3.patch
2015-03-24 13:18:37martin.pantersetmessages: + msg239119
2015-03-17 16:56:49demian.brechtsetfiles: + issue12319_2.patch

messages: + msg238314
2015-03-08 04:17:56demian.brechtsetmessages: + msg237509
2015-03-07 07:46:32demian.brechtsetmessages: + msg237427
2015-03-06 00:39:54demian.brechtsetfiles: + issue12319_1.patch
2015-03-06 00:37:14demian.brechtsetmessages: + msg237313
2015-03-06 00:34:49demian.brechtsetversions: + Python 3.5, - Python 3.3
2015-03-06 00:34:08demian.brechtsetkeywords: + needs review, patch
files: + issue12319.patch
messages: + msg237312

stage: needs patch -> patch review
2015-03-06 00:22:29demian.brechtsetmessages: + msg237309
2015-02-15 13:10:20Rotkrautsetmessages: + msg236037
2015-02-15 05:58:27martin.pantersetmessages: + msg236024
2014-08-29 22:23:41martin.pantersetnosy: + martin.panter
2014-08-28 13:54:29Rotkrautsetmessages: + msg226024
2014-08-28 11:28:08piotr.dobrogostsetmessages: + msg226018
2014-08-28 08:45:48Rotkrautsetfiles: + chunkedhttp.py
nosy: + Rotkraut
messages: + msg226012

2014-07-26 01:15:14demian.brechtsetnosy: + demian.brecht
2014-07-26 00:10:59whitemicesetnosy: + whitemice
2012-10-10 21:51:32piotr.dobrogostsetnosy: + piotr.dobrogost
2012-09-25 13:14:48pitrousetnosy: + pitrou
messages: + msg171268
2011-06-20 08:13:35orsenthilsetassignee: orsenthil
messages: + msg138691
stage: needs patch
2011-06-15 07:39:49harobedsetmessages: + msg138357
2011-06-14 13:03:49pitrousetnosy: + orsenthil
2011-06-14 06:15:57petri.lehtinensetmessages: + msg138296
2011-06-13 15:35:20harobedsetmessages: + msg138258
2011-06-13 13:01:35petri.lehtinensetversions: - Python 2.7
nosy: + petri.lehtinen

messages: + msg138242

type: enhancement
2011-06-12 10:47:13harobedcreate