Message 60975 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	zoyd2k
Recipients
Date	2006-08-18.04:33:28
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
The HTTPResponse class in httplib.py contains the following line in its __init__ method: self.fp = sock.makefile('rb', 0) The zero in that second (bufsize) argument overrides the default behavior of the socket filedescriptor in its readline() method, which is to read in a buffer's worth of data from the socket at a time and only hit the socket again if the buffer runs dry. When bufsize is set to zero, the filedescriptor sets its internal buffer size to one. As a result, readline() makes a system call for every byte of data consumed. Since httplib uses readline to obtain the http header, that's an awful lot of system calls. We noticed this when trying to build a fairly aggressive application on top of xmlrpclib (which relies on httplib); we saw tons of system call activity. There is no comment near this line of code to indicate whether this behavior is intended or not. If it is not intended, the patch is to simply remove the second argument and rely on the default (or allow the caller to specify a buffer size). In case reading a byte at a time is actually intended, we have a simple work-around for those who care to use it. In the python code that uses httplib, add the following: import httplib ... class HTTPResponse(httplib.HTTPResponse): def __init__(self, sock, kw): httplib.HTTPResponse.__init__(self, sock, kw) self.fp = sock.makefile('rb') httplib.HTTPConnection.response_class = HTTPResponse

The HTTPResponse class in httplib.py contains the
following line in its __init__ method:

self.fp = sock.makefile('rb', 0)

The zero in that second (bufsize) argument overrides
the default behavior of the socket filedescriptor in
its readline() method, which is to read in a buffer's
worth of data from the socket at a time and only hit
the socket again if the buffer runs dry. When bufsize
is set to zero, the filedescriptor sets its internal
buffer size to one. As a result, readline() makes a
system call for every byte of data consumed. Since
httplib uses readline to obtain the http header, that's
an awful lot of system calls. We noticed this when
trying to build a fairly aggressive application on top
of xmlrpclib (which relies on httplib); we saw tons of
system call activity. 

There is no comment near this line of code to indicate
whether this behavior is intended or not. If it is not
intended, the patch is to simply remove the second
argument and rely on the default (or allow the caller
to specify a buffer size).

In case reading a byte at a time is actually intended,
we have a simple work-around for those who care to use
it. In the python code that uses httplib, add the
following:

import httplib
...
class HTTPResponse(httplib.HTTPResponse):
    def __init__(self, sock, **kw):
        httplib.HTTPResponse.__init__(self, sock, **kw)
        self.fp = sock.makefile('rb') 

httplib.HTTPConnection.response_class = HTTPResponse

History
Date	User	Action	Args
2008-01-20 09:58:55	admin	link	issue1542407 messages
2008-01-20 09:58:55	admin	create