classification
Title: Fix performance issues in xmlrpclib
Type: performance Stage:
Components: Extension Modules Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: jhylton Nosy List: haypo, hodgestar, jhylton, kristjan.jonsson
Priority: normal Keywords: easy, patch

Created on 2008-11-17 14:33 by kristjan.jonsson, last changed 2009-01-19 10:29 by kristjan.jonsson. This issue is now closed.

Files
File name Uploaded Description Edit
xmlrpc.patch kristjan.jonsson, 2008-11-17 14:33
xmlrpc.patch kristjan.jonsson, 2008-11-25 11:13 A new version of the patch, taking into account various concerns raised.
xmlrpc.patch kristjan.jonsson, 2008-11-27 16:06
xmlrpc.patch kristjan.jonsson, 2008-11-28 10:53
xmlrpc_1.patch kristjan.jonsson, 2008-12-01 10:12 a patch that focuses on the Nagle problem only
httplib.patch kristjan.jonsson, 2008-12-08 15:33 a suggested improvement to the endheaders(data) feature
htmllibclients.patch kristjan.jonsson, 2008-12-08 15:34 use endheaders(body) in stead of separate conn.send(body) in users of the httplib.py
Messages (19)
msg75962 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-17 14:33
There are two performance problems in xmlrpclib.py:
POST requests use two send() calls, one to send the headers and one to 
send the data.  This can invoke the Nagle/Delayed ACK performance 
problem.  On many socket implementations (including some windows 
platforms) it can introduce delays of up to 200ms when talking over the 
wire (as opposed to localhost)
The second is the use of no buffering when reading the xmlrpc 
response.  It is done using the readline() call on a file-like object 
constructed from the socket.  When no buffering is in effect, this 
causes a separate recv() call for each character.

This patch fixes these issues separately:
1) We provide a new getheaderdata() function in httplib which just 
returns the data for the headers instead of sending it.  This can be 
used instead of endheaders() if the intention is to subsequently send 
more data.  xmlrpclib then uses this method of framing its POST 
messages.
2) We provide the named artgument bufsize=0 to the HTTPResponse class 
in httplib, and also so on for the HTTPConnection.getresponse() and 
HTTP.getreply(). xmlrpclib then passes bufsize=-1 to HTTP.getreply() to 
get default buffering for the response fileobject.

This patch focuses on fixing the problems with xmlrpclib.  but issue 1) 
above also has a number of other manifestations in the lib, there are 
other places where we can use getheaderdata() and send() instead of 
endheaders() to avoid possible Nagle/Ack problems, e.g. in urllib.py, 
distutils.py and loggin/handlers.py
msg76174 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-21 10:52
Just a thought here:
Maybe it would be better just to change socket._fileobject to always 
use a minimum of 8k readbuffer for readline() just as read() already 
does.  The read() documentation states that recv(1) is very 
inefficient, and so it is.
The intent, when calling makefile(bufsize=0) is probably to make sure 
that buffering for write() is disabled.
Any thoughts?
msg76343 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-11-24 17:32
This patch makes sense in principle, but some of the details need to
change.  The _send_output() method is used by some clients, merely
because it can be used :-(.  It's fairly easy to preserve this API for
backwards compatibility.

I am also worried about this new api call getheaderdata().  It
complicates the API.  Even if it were necessary, it needs a better name,
because getheaderdata() doesn't sound like a method that changes the
connection state or consumes buffered header data.

I think it would be better to have the new behavior exposed only through
HTTPConnection and not HTTP, since that's a Python 1.5.2 compatibility
API(!).  We can make some small changes to xmlrpclib to use the newer
API.  It would probably be a good change merely because py3k now uses
the newer API.

I've got a working local change, but it's still a little ugly.
msg76346 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-11-24 17:47
Just wanted to mention that the best solution is to update as much code
as possible to use HTTPConnection instead of HTTP.  I'm not sure how
easy it is to do for xmlrpclib, since it exposes methods like
send_content().  I guess we can preserve those APIs somehow, but it
won't be pretty.
msg76349 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-11-24 17:59
Actually, the patch doesn't work :-).  The httplib API allows you to
pass a file or a string for the body of the request.  The
getheaderdata() + body case only works for the string.  Easy to fix,
but I worry that he still run into Nagle problems with the file case,
which was presumably added for efficiency.

Jeremy

On Mon, Nov 24, 2008 at 12:32 PM, Jeremy Hylton <report@bugs.python.org> wrote:
>
> Jeremy Hylton <jeremy@alum.mit.edu> added the comment:
>
> This patch makes sense in principle, but some of the details need to
> change.  The _send_output() method is used by some clients, merely
> because it can be used :-(.  It's fairly easy to preserve this API for
> backwards compatibility.
>
> I am also worried about this new api call getheaderdata().  It
> complicates the API.  Even if it were necessary, it needs a better name,
> because getheaderdata() doesn't sound like a method that changes the
> connection state or consumes buffered header data.
>
> I think it would be better to have the new behavior exposed only through
> HTTPConnection and not HTTP, since that's a Python 1.5.2 compatibility
> API(!).  We can make some small changes to xmlrpclib to use the newer
> API.  It would probably be a good change merely because py3k now uses
> the newer API.
>
> I've got a working local change, but it's still a little ugly.
>
> ----------
> assignee:  -> jhylton
> nosy: +jhylton
> status: open -> pending
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4336>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu
>
>
msg76406 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-25 11:13
I have addressed some issues mentioned:
1) I have retained the _send_output() method.
2) the endheaders() method now takes an optional argument, send_data 
that defaults to True.  It also returns any unsent data as a string.  
This simplifies any code wishing to send more data.
3) The old HTTP class needs no changes anymore.
4) I've fixed the test_xmlrpc.py test case to run all the tests on 
windows.  The old concern with "WSAEWOULDBLOCK" seems to be no longer 
valid.
5) concatenating the result from endheaders() with the request body is 
valid, in xmlrpclib, because the assumption has already been made in 
send_content() that request_body is a string (str(len(request_body)).  
However, in httplib's request() method, we now special case this.  I 
don't want to complicate the code for what appears to be a rare case.

I chose not to mess with xmlrpclib and make it continue to use the old 
HTTP class in order to minimise the extent of this patch.  A simple and 
clear performance patch has in my experience a much greater chance of 
finding its way into the codebase than a more extensive rewrite :)

You may have concerns regarding point 3 above, and I am open to 
suggestions.

Now, what remains is the question of the read buffering for the socket 
fileobject.  Do you think that it is feasible to simply change the 
default read buffering for fileobjects to use buffering for readline() 
same as they do for read()?  Then we wouldn't need the second half of 
this patch and we could make a separate "socket" performance patch.
msg76407 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-25 11:15
Sorry, I meant : "you may have concerns regarding point 2) above"
msg76489 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-11-26 21:53
I think we're making progress, but I'm still not sure about the new
httplib api.  My current worry is that endheaders() behaves very
differently when send_data is false.  My chief concern is that the
__state variable is going to indicate that the request has been sent
when we're really depending on the client to send the header.  In
general, I don't like the public send() method since it's exposing a
raw socket that could be called at any time regardless of the _state
of the object.

What if endheaders() takes body as an optional argument and sends it
along with the headers?  This accomplishes the same basic goal as
returning the header from endheaders(), but keeps the actual sending
of data encapsulated within the http connection.

Jeremy

On Tue, Nov 25, 2008 at 6:13 AM, Kristján Valur Jónsson
<report@bugs.python.org> wrote:
>
> Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:
>
> I have addressed some issues mentioned:
> 1) I have retained the _send_output() method.
> 2) the endheaders() method now takes an optional argument, send_data
> that defaults to True.  It also returns any unsent data as a string.
> This simplifies any code wishing to send more data.
> 3) The old HTTP class needs no changes anymore.
> 4) I've fixed the test_xmlrpc.py test case to run all the tests on
> windows.  The old concern with "WSAEWOULDBLOCK" seems to be no longer
> valid.
> 5) concatenating the result from endheaders() with the request body is
> valid, in xmlrpclib, because the assumption has already been made in
> send_content() that request_body is a string (str(len(request_body)).
> However, in httplib's request() method, we now special case this.  I
> don't want to complicate the code for what appears to be a rare case.
>
> I chose not to mess with xmlrpclib and make it continue to use the old
> HTTP class in order to minimise the extent of this patch.  A simple and
> clear performance patch has in my experience a much greater chance of
> finding its way into the codebase than a more extensive rewrite :)
>
> You may have concerns regarding point 3 above, and I am open to
> suggestions.
>
> Now, what remains is the question of the read buffering for the socket
> fileobject.  Do you think that it is feasible to simply change the
> default read buffering for fileobjects to use buffering for readline()
> same as they do for read()?  Then we wouldn't need the second half of
> this patch and we could make a separate "socket" performance patch.
>
> Added file: http://bugs.python.org/file12127/xmlrpc.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4336>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu
>
>
msg76502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-27 16:06
I like the suggestion of having endheaders() accept any extra data.
I have uploaded a new patch which implements this idea.  It simplifies 
things a lot.

The issue with the read buffer size is still open.  I have sent an 
email to python-dev requesting comments.
msg76523 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-28 10:53
Guido pointed out the problem with _fileobject.readline() being 
followed by socket.recv() if readline uses read buffering.  
xmlrpclib.py was attempting to directly use the underlying socket, 
although in actual fact it never did, (since HTTPConnectio had closed 
it when it returned the response from getresponse()) Never the less, it 
is prudent to make sure that we don't attempt this.
There really should be no need to use the socket directly, a buffered 
read() call is just as efficient.
msg76578 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-11-29 01:52
I did the simple part of the patch, where the request and headers are
sent at the same time.  The applied patch didn't pass the test suite,
and I want to think about the buffering change a bit more.  It's
definitely tricky.

Jeremy

On Fri, Nov 28, 2008 at 5:53 AM, Kristján Valur Jónsson
<report@bugs.python.org> wrote:
>
> Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:
>
> Guido pointed out the problem with _fileobject.readline() being
> followed by socket.recv() if readline uses read buffering.
> xmlrpclib.py was attempting to directly use the underlying socket,
> although in actual fact it never did, (since HTTPConnectio had closed
> it when it returned the response from getresponse()) Never the less, it
> is prudent to make sure that we don't attempt this.
> There really should be no need to use the socket directly, a buffered
> read() call is just as efficient.
>
> Added file: http://bugs.python.org/file12145/xmlrpc.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4336>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu
>
>
msg76682 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-01 10:12
Really?
Works for me.  Here is a simple patch where we leave the read buffering.  
passes test_xmlrpclib.py and test_httplib.py
msg76697 - (view) Author: Jeremy Hylton (jhylton) Date: 2008-12-01 15:26
I submitted r67442, which combines the headers and body in a single
send() call.  We should look at the buffering issue now, although I
probably won't have any time to check on it until Friday.
msg77024 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-05 15:06
I think it would have been better to have endheaders() (and then perhaps 
_send_output())  deal with the non-string (i.e. filebuffer) case, so 
that endheaders(body) is semantically equivalent to endheaders(); 
send(body).   The version you checked in makes it necessary that the 
user of HTTPConnection.endheaders() is aware of the distinction.
msg77309 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-08 15:33
Please consider applying this patch.  It moves the logic handling of 
non-string message_body inside of endheaders(), allowind clients of 
HTTPConnection to substitute endheader();send(data) with 
endheaders(data) without fear.
msg77310 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-12-08 15:34
In light of my previous patch, here is a an improvement to urllib.py and 
logging that uses endheaders(body)
msg79407 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-08 10:49
Reopening this, since I am not entirely happy with the implementation 
that was checked in.  Please see my comments from Dec 5th onwards.
msg79502 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-09 20:27
Checked in revision: 68458 and  
revision: 68459
msg80153 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-19 10:29
note, this has been ported to Py3k in http://svn.python.org/view?
view=rev&rev=68458
History
Date User Action Args
2009-01-19 10:29:43kristjan.jonssonsetkeywords: patch, patch, easy
messages: + msg80153
2009-01-09 20:27:40kristjan.jonssonsetstatus: open -> closed
resolution: accepted
messages: + msg79502
keywords: patch, patch, easy
2009-01-08 10:49:01kristjan.jonssonsetstatus: pending -> open
keywords: patch, patch, easy
messages: + msg79407
2008-12-08 15:34:59kristjan.jonssonsetkeywords: patch, patch, easy
files: + htmllibclients.patch
messages: + msg77310
2008-12-08 15:33:56kristjan.jonssonsetkeywords: patch, patch, easy
files: + httplib.patch
messages: + msg77309
2008-12-05 15:06:39kristjan.jonssonsetkeywords: patch, patch, easy
messages: + msg77024
2008-12-01 15:26:20jhyltonsetkeywords: patch, patch, easy
messages: + msg76697
2008-12-01 10:32:02hodgestarsetnosy: + hodgestar
2008-12-01 10:13:03kristjan.jonssonsetkeywords: patch, patch, easy
files: + xmlrpc_1.patch
messages: + msg76682
2008-11-29 01:52:25jhyltonsetmessages: + msg76578
2008-11-28 10:53:26kristjan.jonssonsetkeywords: patch, patch, easy
files: + xmlrpc.patch
messages: + msg76523
2008-11-27 16:06:56kristjan.jonssonsetkeywords: patch, patch, easy
files: + xmlrpc.patch
messages: + msg76502
2008-11-26 21:53:58jhyltonsetmessages: + msg76489
2008-11-25 11:15:33kristjan.jonssonsetkeywords: patch, patch, easy
messages: + msg76407
2008-11-25 11:13:39kristjan.jonssonsetkeywords: patch, patch, easy
files: + xmlrpc.patch
messages: + msg76406
2008-11-24 18:02:56hayposetkeywords: patch, patch, easy
nosy: + haypo
2008-11-24 17:59:11jhyltonsetmessages: + msg76349
2008-11-24 17:47:28jhyltonsetkeywords: patch, patch, easy
messages: + msg76346
2008-11-24 17:32:14jhyltonsetstatus: open -> pending
assignee: jhylton
messages: + msg76343
keywords: patch, patch, easy
nosy: + jhylton
2008-11-21 10:52:53kristjan.jonssonsetkeywords: patch, patch, easy
messages: + msg76174
2008-11-17 14:33:40kristjan.jonssoncreate