classification
Title: http.client PUT method ignores error responses sent immediatly after headers
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, Wiktor Niesiobedzki, martin.panter
Priority: normal Keywords: patch

Created on 2015-12-21 20:20 by Wiktor Niesiobedzki, last changed 2016-01-11 17:19 by r.david.murray.

Files
File name Uploaded Description Edit
http.client.put.fix.patch Wiktor Niesiobedzki, 2015-12-22 20:17
http.client.put.fix.patch Wiktor Niesiobedzki, 2015-12-22 23:01
early_http_response_tests.patch Wiktor Niesiobedzki, 2016-01-10 19:32 review
Messages (13)
msg256808 - (view) Author: Wiktor Niesiobedzki (Wiktor Niesiobedzki) * Date: 2015-12-21 20:20
It looks like, when doing PUT together with a file-like object to send, http.client will try to send the whole file before analysing the response from the server.

If you do the following:
$ dd if=/dev/zero of=/tmp/300mb.zero bs=1M count=300

And then run following code in Python 3.4 (tested 3.4.3 on Linux and FreeBSD):
import http.client
conn = http.client.HTTPSConnection('api.onedrive.com')
req = conn.request(
    method='PUT',
    url='https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content',
    body=open("/tmp/300mb.zero", "rb"))
resp = conn.getresponse()

You'll notice the hang. After some time, it will aborted with BrokenPipeError: [Errno 32] Broken pipe. If you run the following code within pdb debugger, and interrupt, you'll probably interrupt somewhere within write loop. You can call self.read() and see, that HTTP 413 is waiting to be interpreted.

Doing similar action with curl:
$ $ curl -v -T /tmp/300mb.zero https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content

Gives you immediate HTTP 413 error. Can we have the same behaviour in http.client library?
msg256809 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-21 20:26
Perhaps by doing non-blocking reads between the writes[*]? I suppose it is possible, but it might complicate the code considerably.

[*] or re-write it using asyncio, but that is definitely out of scope :)
msg256859 - (view) Author: Wiktor Niesiobedzki (Wiktor Niesiobedzki) * Date: 2015-12-22 20:17
Maybe something like this? Doesn't look too complicated and I haven't noticed any breakage yet.
msg256862 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2015-12-22 20:35
There is a test suite which can be run to test for breakages:

./python -m test test_httplib

If you do that you'll notice that some things are broken, I got error in 5 different tests related to you select.select call:

TypeError: argument must be an int, or have a fileno() method. 

Your test code seems to be running fine now, though.
msg256863 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2015-12-22 20:43
That was a testing issue, apparently test.test_httplib.FakeSocket is not fake enough.
msg256878 - (view) Author: Wiktor Niesiobedzki (Wiktor Niesiobedzki) * Date: 2015-12-22 23:01
Here is revised patch. Also covers changes to tests.
msg256906 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2015-12-23 10:39
All the tests pass now, not sure why your patch doesn't get associated rietveld link, it applies cleanly using hg patch --no-commit
msg257430 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-03 22:00
Curl uses “Expect: 100-continue”. This is the relevant output from it:

$ curl -v -T 300mb --no-fail https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content
[. . .]
> PUT /v1.0/drives/me/root:/test.file:/content HTTP/1.1
> Host: api.onedrive.com
> Content-Range: bytes 0-314572799/314572800
> User-Agent: curl/7.43.0
> Accept: */*
> Connection: TE
> TE: gzip
> Content-Length: 314572800
> Expect: 100-continue
> 
< HTTP/1.1 413 Request Entity Too Large
< Content-Length: 0
< Server: Microsoft-HTTPAPI/2.0
< P3P: CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"
< X-MSNSERVER: DM2302____PAP179
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< X-ThrowSite: 7c11.e25d
< Date: Sun, 03 Jan 2016 20:04:27 GMT
* HTTP error before end of send, stop sending
< 
* Closing connection 0

See Issue 1346874 about support for 100-continue mode.

When I tried your test code on Linux, it hung for two minutes, and then raised ConnectionResetError. I guess the server refused to receive data, and then timed the connection out. The patch should handle this in most cases, but I think there could still a race between select() returning an empty rlist, and then the server sending a request and causing the upload to hang. Also, there is nothing to protect from sendall() partially uploading a chunk, and then hanging.

It should be more robust to check for both send and receive sides of the socket with select(), and only use send(), not sendall(), to avoid blocking.

I would consider this a new feature, so I don’t think it should go into 3.5. It would be good to add documentation and test cases as well.

Wiktor’s problem is similar to the situations in Issue 5542. There, the original poster seemed happy to wait for a broken pipe error, and then inspect a 401 Unauthorized response. This apparently was made to work with plain HTTP over TCP, but not HTTPS.

I wonder if having request() silently return is the best API design. Is there a possibility that the caller would want to distinguish a successful upload from an aborted one? I can’t think of one, but it seems plausible.

Another related enhancement I would like is the ability to detect an error response (e.g. 408 Request Timeout) or dropped connection _before_ even sending a request. This is useful with persistent connections; see items 1 and 2 in <https://bugs.python.org/issue9740#msg240459>.
msg257431 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-03 22:45
For non-blocking SSL sockets, the Python documentation suggests the results of select() do not directly indicate if data can be sent or received: <https://docs.python.org/dev/library/ssl.html#ssl-nonblocking>. I guess this would also apply to blocking SSL sockets.
msg257486 - (view) Author: Wiktor Niesiobedzki (Wiktor Niesiobedzki) * Date: 2016-01-04 20:13
I've checked how it behaves, when I do:
$ openssl s_client -host api.onedrive.com -port 443

The server then works like that (using curl debug log style):
> PUT /v1.0/drives/me/root:/test.file:/content HTTP/1.1
> Host: api.onedrive.com
> Content-Range: bytes 0-314572799/314572800
> User-Agent: curl/7.43.0
> Accept: */*
> Content-Length: 314572800
> 
< HTTP/1.1 413 Request Entity Too Large
< Content-Length: 0
< Server: Microsoft-HTTPAPI/2.0
< P3P: CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"
< X-MSNSERVER: DM2304____PAP130
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< X-ThrowSite: 7c11.e25d
< Date: Mon, 04 Jan 2016 18:44:28 GMT
> 0000000000000000000000000000000000000000000000000000
[...]

And I may continue sending the data.

As for 100-continue - it looks like Microsoft server is not supporting Expect: 100-continue header, and it just waits for data, when content is shorter (like 100 bytes).

I do not see clear indications how the client should behave in such situation (response received before sending complete request).

I do not understand, where you see the race condition. As we are sending in blocks of 8kb, if we missed the select on first block, we will catch something in next select. If server would close down the socket immediately, we should receive failure in sock.sendall() -> writing to socket that is closed for writing by remote party.

As I understand, right now my patch breaks uploads, when server returns HTTP 100 response. Contrary to Issue 1346874 I'd recommend to check - if server returned 100 response, then continue the upload of the file or prepare error response for get_response(). I do not see why this behaviour should be implemented in code using http.client.

Status of upload should be (and it is right now) done using get_response() anyway, as any upload request might end in HTTP error response, so I vouch for empty return value - it is complaint with current interface.

I'm in progress of preparation of the test cases for such scenario using SocketServer, to keep it close to how network behaves, but maybe I'll manage to prepare testcases using FakeSocket.
msg257508 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-05 00:50
As I see it, if the client receives an early response (when 100-continue is not used) but needs to make more requests, the only choices are:

* Complete sending the request if possible, after which the connection may still be usable for a second request. But I’m not sure how many servers would support this; for your Microsoft server this would not be useful.

* Abort the connection and start a new one.

Perhaps the race condition would be more obvious if you sent the upload as a single 300 MiB bytes() object rather than a file. Then the sendall() call will be a 300 MiB chunk rather than 8 KiB. If the response is received after sendall() starts, I expect you will see the original two minute delay again. If this were plain TCP, you could replace sendall() with fine-grained select() and send() calls in a loop.

Here is a demonstration of the SSL renegotiation issue. I used separate client and server terminal windows. I don’t know if there is a way to force renegotiation purely in Python’s SSL module, so I used an external Open SSL server.

1: In the server window, start an SSL server on port 4433:
$ openssl s_server -cert Lib/test/keycert.pem 

2: In the client window, start a request in the Python interpreter. It will pause to read stdin:
>>> import http.client, ssl, sys
>>> context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
>>> context.load_verify_locations("/usr/lib/python3.5/test/keycert.pem")
>>> conn = http.client.HTTPSConnection('localhost', 4433, context=context)
>>> conn.request("PUT", "/", headers={"Content-Length": "4"}, body=sys.stdin.buffer)

3: In the server, the request header is received. Type lowercase “r” to force a renegotiation:
Secure Renegotiation IS supported
PUT / HTTP/1.1
Host: localhost:4433
Accept-Encoding: identity
Content-Length: 8

r
SSL_do_handshake -> 1

4: In Python, type in upload data and repeat Ctrl+D to signal EOF until the request() call stops reading from stdin and you get the Python prompt back:
abc  <== 3 letters + newline + Ctrl+D
>>> 

5: Notice that the no body has been uploaded to the server.
msg257929 - (view) Author: Wiktor Niesiobedzki (Wiktor Niesiobedzki) * Date: 2016-01-10 19:32
Here are the test cases that I've come up with.

test_response_after_headers - tests for the case that I'm missing
test_ssl_renegotiation - tests for the case that Martin point out

As in stock ssl library there is no way to force renegotiation, I've just separated the receiving of standard HTTP status message into few parts and checking, if the http.client is still sending the contents.

So - without my patch, first case is failing, with my patch - the second case is failing.  I'll try to find some better solution here, which will cover both cases.
msg257945 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-10 23:07
Sorry for making a seemingly easy problem harder :)

Your version of test_ssl_renegotiation() would actually be harder to fix IMO, since that would require the client to receive response data (the “H”) and continue to transmit request data concurrently. It would probably require sending and receiving in separate coroutines or threads, like asyncio.

There may still be a way to avoid that if we don’t need to send and receive HTTP data at the same time. We only need the low-level SSL messages to work properly. The following might work:

self.sock.setblocking(False)
try:
    ...  # Get chunks of request body to send
    while datablock:
        select((self._reader,), (self.sock,), (), self._timeout)
        if self._reader.peek():
            ...  # Early response received
        try:
            amount = self.sock.send(datablock)
        except (BlockingIOError, SSLWantRead, SSLWantWrite):
            continue
        datablock = datablock[amount:]
finally:
    self.sock.settimeout(self.timeout)

This might require the following problems investigated and fixed first:

* Need to maintain the buffered socket reader of self.sock.makefile(), before the response is actually read. Could use my patch for Issue 23377.

* Allow a buffered socket reader to be non-blocking without upsetting its buffer. The documentation <https://docs.python.org/release/3.4.3/library/socket.html#socket.socket.makefile> disallows non-blocking mode, but I suspect that is just a Python 2 relic and there is no restriction in the Python 3 implementation. I thought there was a bug open for this but I can only find mentions in other bugs.

* Allow using BufferedReader.peek() to see if data is ready to be read. The current documentation does not guarantee it will return any bytes (Issue 5811), but I understand the implementation normally does return ≥ 1 byte.

* Clarify what peek() should do if no non-blocking data is available. Currently I think it returns b"", but that is inconsistent with other APIs, and does not differentiate from EOF. I proposed to change peek() to return None in Issue 13322.
History
Date User Action Args
2016-01-11 17:19:28r.david.murraysetnosy: - r.david.murray
2016-01-10 23:07:48martin.pantersetmessages: + msg257945
2016-01-10 19:32:46Wiktor Niesiobedzkisetfiles: + early_http_response_tests.patch

messages: + msg257929
2016-01-05 00:50:07martin.pantersetmessages: + msg257508
2016-01-04 20:13:02Wiktor Niesiobedzkisetmessages: + msg257486
2016-01-03 22:45:35martin.pantersetmessages: + msg257431
2016-01-03 22:00:14martin.pantersetnosy: + martin.panter

messages: + msg257430
stage: patch review
2015-12-25 17:29:02serhiy.storchakasettitle: htp.client PUT method ignores error responses sent immediatly after headers -> http.client PUT method ignores error responses sent immediatly after headers
2015-12-23 10:39:28SilentGhostsetmessages: + msg256906
2015-12-22 23:01:49Wiktor Niesiobedzkisetfiles: + http.client.put.fix.patch

messages: + msg256878
2015-12-22 20:43:47SilentGhostsetmessages: + msg256863
2015-12-22 20:35:29SilentGhostsetnosy: + SilentGhost
messages: + msg256862
2015-12-22 20:17:50Wiktor Niesiobedzkisetfiles: + http.client.put.fix.patch
keywords: + patch
messages: + msg256859
2015-12-21 20:49:14SilentGhostsetcomponents: + Library (Lib), - IO
versions: + Python 3.5, Python 3.6, - Python 3.4
2015-12-21 20:26:29r.david.murraysetnosy: + r.david.murray
messages: + msg256809
2015-12-21 20:20:23Wiktor Niesiobedzkicreate