Issue 1542407: httplib reads one byte per system call

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43849

classification

Title:	httplib reads one byte per system call
Type:		Stage:
Components:	Extension Modules	Versions:	Python 2.4

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	httplib read() very slow due to lack of socket buffer View: 2576
Assigned To:		Nosy List:	akuchling, georg.brandl, josiahcarlson, zoyd2k
Priority:	normal	Keywords:

Created on 2006-08-18 04:33 by zoyd2k, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg60975 - (view)	Author: Zoyd Wheeler (zoyd2k)	Date: 2006-08-18 04:33
The HTTPResponse class in httplib.py contains the following line in its __init__ method: self.fp = sock.makefile('rb', 0) The zero in that second (bufsize) argument overrides the default behavior of the socket filedescriptor in its readline() method, which is to read in a buffer's worth of data from the socket at a time and only hit the socket again if the buffer runs dry. When bufsize is set to zero, the filedescriptor sets its internal buffer size to one. As a result, readline() makes a system call for every byte of data consumed. Since httplib uses readline to obtain the http header, that's an awful lot of system calls. We noticed this when trying to build a fairly aggressive application on top of xmlrpclib (which relies on httplib); we saw tons of system call activity. There is no comment near this line of code to indicate whether this behavior is intended or not. If it is not intended, the patch is to simply remove the second argument and rely on the default (or allow the caller to specify a buffer size). In case reading a byte at a time is actually intended, we have a simple work-around for those who care to use it. In the python code that uses httplib, add the following: import httplib ... class HTTPResponse(httplib.HTTPResponse): def __init__(self, sock, kw): httplib.HTTPResponse.__init__(self, sock, kw) self.fp = sock.makefile('rb') httplib.HTTPConnection.response_class = HTTPResponse
msg60976 - (view)	Author: Josiah Carlson (josiahcarlson) *	Date: 2006-08-27 17:44
Logged In: YES user_id=341410 Because the socket is in blocking mode, performing self.fp.read(x) with an x > 1 will generally block until it has read x bytes or the other end disconnects. As such, I believe it is intended behavior. For your own application, you could perhaps write a response class that uses non-blocking sockets to handle readline, switching back to blocking sockets after header reading is over.
msg60977 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2006-09-07 13:41
Logged In: YES user_id=11375 Also, I have a vague memory that httplib is used with pipelined HTTP requests. Buffering on the makefile() then causes problems because the stdio buffer can slurp up too much data, including portions of the next pipelined request. I think making buffering workable would require serious restructuring of the httplib code.
msg65932 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-04-28 20:05
Dupe of #2576.

History
Date	User	Action	Args
2022-04-11 14:56:19	admin	set	github: 43849
2008-04-28 20:05:30	georg.brandl	set	status: open -> closed resolution: duplicate superseder: httplib read() very slow due to lack of socket buffer messages: + msg65932 nosy: + georg.brandl
2006-08-18 04:33:28	zoyd2k	create