classification
Title: httplib reads one byte per system call
Type: Stage:
Components: Extension Modules Versions: Python 2.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: httplib read() very slow due to lack of socket buffer
View: 2576
Assigned To: Nosy List: akuchling, georg.brandl, josiahcarlson, zoyd2k
Priority: normal Keywords:

Created on 2006-08-18 04:33 by zoyd2k, last changed 2008-04-28 20:05 by georg.brandl. This issue is now closed.

Messages (4)
msg60975 - (view) Author: Zoyd Wheeler (zoyd2k) Date: 2006-08-18 04:33
The HTTPResponse class in httplib.py contains the
following line in its __init__ method:

self.fp = sock.makefile('rb', 0)

The zero in that second (bufsize) argument overrides
the default behavior of the socket filedescriptor in
its readline() method, which is to read in a buffer's
worth of data from the socket at a time and only hit
the socket again if the buffer runs dry. When bufsize
is set to zero, the filedescriptor sets its internal
buffer size to one. As a result, readline() makes a
system call for every byte of data consumed. Since
httplib uses readline to obtain the http header, that's
an awful lot of system calls. We noticed this when
trying to build a fairly aggressive application on top
of xmlrpclib (which relies on httplib); we saw tons of
system call activity. 

There is no comment near this line of code to indicate
whether this behavior is intended or not. If it is not
intended, the patch is to simply remove the second
argument and rely on the default (or allow the caller
to specify a buffer size).

In case reading a byte at a time is actually intended,
we have a simple work-around for those who care to use
it. In the python code that uses httplib, add the
following:

import httplib
...
class HTTPResponse(httplib.HTTPResponse):
    def __init__(self, sock, **kw):
        httplib.HTTPResponse.__init__(self, sock, **kw)
        self.fp = sock.makefile('rb') 

httplib.HTTPConnection.response_class = HTTPResponse
msg60976 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-08-27 17:44
Logged In: YES 
user_id=341410

Because the socket is in blocking mode, performing
self.fp.read(x) with an x > 1 will generally block until it
has read x bytes or the other end disconnects.  As such, I
believe it is intended behavior.

For your own application, you could perhaps write a response
class that uses non-blocking sockets to handle readline,
switching back to blocking sockets after header reading is over.
msg60977 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-09-07 13:41
Logged In: YES 
user_id=11375

Also, I have a vague memory that httplib is used with
pipelined HTTP requests.  Buffering on the makefile() then
causes problems because the stdio buffer can slurp up too
much data, including portions of the next pipelined request.
I think making buffering workable would require serious
restructuring of the httplib code.
msg65932 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-04-28 20:05
Dupe of #2576.
History
Date User Action Args
2008-04-28 20:05:30georg.brandlsetstatus: open -> closed
resolution: duplicate
superseder: httplib read() very slow due to lack of socket buffer
messages: + msg65932
nosy: + georg.brandl
2006-08-18 04:33:28zoyd2kcreate