classification
Title: Allow buffering for HTTPResponse
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, gregory.p.smith, kristjan.jonsson, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2009-01-08 10:43 by kristjan.jonsson, last changed 2009-08-24 20:58 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
nobuffer.patch kristjan.jonsson, 2009-01-08 10:43 Suggested patch to enable readline buffering for HTTP headers
Messages (16)
msg79406 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-08 10:43
This is related to issue 4336.  While that issue finally revolved around 
fixing the Nagle problem for xmlrpc and other http clients, here I 
propose to fix another performance bottleneck with xmlrpc, reading the 
HTTP Headers.
HTTPResponse creates a socket.fileobject() with zero buffering which 
means that the readline() operations used to read the headers become 
very inefficient since individual socket.recv() calls are used for each 
character.
The reason for this is to support users who subsequently do 
socket.recv() on the underlying socket, and so we don't want to read too 
much of the headers. Note that there is no example of this in the 
standard library.
In the provided patch, I have removed some vestigial code from 
xmrlpclib.py which attempted to use recv() even though it never did 
(because the connection has been closed when HTTPConnection returns the 
response).  Even so, Guido has expressed his concern that we need to 
keep the support for this recv() behaviour in place.
Therefore, I have added a new optional argument, nobuffer=True, which 
users that promise not to call recv() can set to false.  This will then 
cause the generated fileobject from HTTPResponse to have default 
buffering, greatly increasing the performance of the reading of the 
headers.
msg79552 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-10 16:27
Why not name the parameter buffering=False rather than nobuffer=True?
Sounds more "natural" to me.
msg79601 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-11 16:19
Yes, if you think so.  But my intention was to indicate that 
the "nobuffering" is a special feature the user can turn off to resort 
to what could be considered normal, buffered, behaviour.

But either way is fine, and I'll be happy to change it.
msg79602 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-11 16:24
Checked in as revision: 68532
msg80064 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-01-18 00:18
Kristján, could you merge this change into the Py3k branch, please?
msg80254 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-20 14:17
Adding crossref to http://bugs.python.org/issue4448 regarding the 
porting of this feature to 3.1
msg80946 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-02-02 16:04
Checked in r69209 to py3k
msg81020 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-02-03 00:34
This causes failures in test_urllib2net.

The fix is easy: a handful of
-        self.assertTrue(u.fp._sock.gettimeout() is None)
+        self.assertTrue(u.fp.raw._sock.gettimeout() is None)

But doesn't this show a backward-incompatible change?
or is the _sock member an implementation detail anyway?
msg81035 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-02-03 09:08
The socket.socket.makefile() now returns a quite different kind of 
object, namely a SocketIO thing.  This comes as a result of the IO 
refactoring in 3.0.
The good side to this is that old and naughty apps have been forbidden 
to access the _sock member directly.  This was the reason that we had 
to disable read buffering in 2.x:  Some clients would read directly 
from the socket, even though it was "hidden".

As for the test failure, I admit that I didn't enable the "network" 
resource for the testsuite.  Will fix.

Btw, test_xmlrpc_net.py fails, because time.xmlrpc.com doesn't 
resolve.  Are there any other good and stable rpc servers out there?
msg81036 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-02-03 09:14
I'm afraid I misunderstood the problem.
Yes, since u.fp is now a io.BufferedReader() rather than a socket.Socket
(), the _sock member has moved.

But looking at the other assertions in test_urllib2net.py, you can see 
the different ways the code uses to get at the socket:  There is 
u.fp.raw._sock and there is u.fp.fp.raw._sock.

IMHO I think we have to assume the _sock member to be an implementation 
detail that is not part of the interface.
msg91597 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-08-15 06:46
I believe there will be a problem with the patch committed in r68532. 
If getresponse(buffering=True) is called, extra data on the socket may
be consumed by the socket.makefile() buffer which will cause problems if
the connection is not closed immediately after this response.

I mentioned this in comments on http://bugs.python.org/issue2576 which
happens to have a way of fixing this already proposed as they are trying
to solve an almost identical issue to this without yet having seen this
issue.
msg91621 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-08-15 22:39
trunk r74463 now forces the HTTPResponse to close afterwards when
buffering=True to avoid the issue.
msg91683 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-18 12:42
And that update causes failures in test_xmlrpc.
msg91684 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-18 12:47
I think I'll open a new ticket instead.
msg91914 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-08-24 11:34
Gregory, please revert your change (74463).
There is no "extra data" after the response, since such data can only 
be generated as a result of a new "request".

Your change has disabled the HTTP/1.1 keepalive capability, causing 
test failurres in xmlrpc.  Also showing perhaps that we need test cases 
for keepalive in regular httplib.py testsuite.

If you think there are problems, you a new defect should be created.
msg91939 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-08-24 20:58
Already reverted in

r74522 | gregory.p.smith | 2009-08-18 22:33:48 -0700 (Tue, 18 Aug 2009)

for that reason.
History
Date User Action Args
2009-08-24 20:58:42gregory.p.smithsetkeywords: patch, patch

messages: + msg91939
2009-08-24 11:34:47kristjan.jonssonsetkeywords: patch, patch

messages: + msg91914
2009-08-18 12:47:38r.david.murraysetstatus: open -> closed
keywords: patch, patch
messages: + msg91684

stage: commit review -> resolved
2009-08-18 12:43:52r.david.murraysetstatus: closed -> open
priority: normal
keywords: patch, patch
stage: commit review
2009-08-18 12:42:16r.david.murraysetkeywords: patch, patch
nosy: + r.david.murray
messages: + msg91683

2009-08-15 22:39:58gregory.p.smithsetkeywords: patch, patch

messages: + msg91621
2009-08-15 06:46:12gregory.p.smithsetkeywords: patch, patch
nosy: + gregory.p.smith
messages: + msg91597

2009-02-03 09:14:59kristjan.jonssonsetkeywords: patch, patch
messages: + msg81036
2009-02-03 09:08:57kristjan.jonssonsetkeywords: patch, patch
messages: + msg81035
2009-02-03 00:34:31amaury.forgeotdarcsetkeywords: patch, patch
nosy: + amaury.forgeotdarc
messages: + msg81020
2009-02-02 16:04:55kristjan.jonssonsetkeywords: patch, patch
messages: + msg80946
2009-01-20 14:17:07kristjan.jonssonsetkeywords: patch, patch
messages: + msg80254
2009-01-18 00:18:12benjamin.petersonsetkeywords: patch, patch
nosy: + benjamin.peterson
messages: + msg80064
2009-01-11 16:24:57kristjan.jonssonsetstatus: open -> closed
keywords: patch, patch
resolution: accepted
messages: + msg79602
2009-01-11 16:19:51kristjan.jonssonsetkeywords: patch, patch
messages: + msg79601
2009-01-10 16:27:46pitrousetkeywords: patch, patch
nosy: + pitrou
messages: + msg79552
2009-01-08 14:32:53kristjan.jonssonsetkeywords: patch, patch
title: Allo buffering for HTTPResponse -> Allow buffering for HTTPResponse
2009-01-08 10:43:55kristjan.jonssoncreate