Issue 4448: should socket readline() use default_bufsize instead of _rbufsize?

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48698

classification

Title:	should socket readline() use default_bufsize instead of _rbufsize?
Type:	performance	Stage:
Components:		Versions:	Python 3.1, Python 2.6

process

Status:	closed	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	ggenellina, gregory.p.smith, gvanrossum, kristjan.jonsson
Priority:	normal	Keywords:

Created on 2008-11-27 20:57 by gregory.p.smith, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg76516 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2008-11-27 20:57
.... From Kristján Valur Jónsson (kristjan at ccpgames.com) on python-dev: http://mail.python.org/pipermail/python-dev/2008-November/083724.html .... I came across this in socket.c: # _rbufsize is the suggested recv buffer size. It is strictly # obeyed within readline() for recv calls. If it is larger than # default_bufsize it will be used for recv calls within read(). What I worry about is the readline() case. Is there a reason why we want to strictly obey it for that function? Note that in the documentation for _fileobject.read() it says: # Use max, disallow tiny reads in a loop as they are very inefficient. The same argument surely applies for readline(). The reason I am fretting about this is that httplib.py (and therefore xmlrpclib.py) specify bufsize=0 when createing their socket fileobjects, presumably to make sure that write() operations are not buffered but flushed immediately. But this has the side effect of setting the _rbufsize to 1, and so readline() calls become very slow. I suggest that readline() be made to use at least defaultbufsize, like read(). Any thoughts?
msg76520 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-11-28 04:34
You meant socket.py. This is an extremely subtle area. I would be very wary of changing this -- there is a use case where headers are read from the socket using readline() but the rest of the data is read directly from the socket, and this would break if there was buffered data in the file objects. This is exactly why httplib sets the buffer size to 0. Fortunately things are completely different in Python 3.0 and I believe the same problem doesn't exist -- in 3.0 it makes more sense to always read from the (binary) buffered file object representing the socket.
msg76522 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2008-11-28 09:35
If you look at http://bugs.python.org/issue4336, half of the proposed patch is an attempt to deal with this performance issue. In the patch, we laboriously ensure that bufsize=-1 is passed in for for the xmlrpc client. Seeing your comment, I realize that xmlrpclib.py also uses direct access to h._conn.sock (if present) and uses recv() on that. In fact, that is the only place in the standard library where I can find this pattern. Was that a performance improvement? It is hard to see how bypassing buffered read with a manual recv() can significantly alter performance. In all the cases in the test_xmlrpc.py, h._conn.sock is actually None because h._conn has been closed in HttpConnection.getresponse() Therefore, my patch continues to work. However, I will fix that patch to cater to this strange special case. However, please observe that since _fileobject.read() calls are always buffered, in general there is no way to safely mix read() and recv() calls, althought the recv() and readline() has been fudged to work. Isn´t this just a case of a wart in the standard lib that we ought to remove? Here is a suggestion: 1) document why readline() observes 0 buffering (to enable it to be used as a readline() utility tool on top of vanilla socket recv() 2) stop doing that in xmrlrpclib and use default buffering.
msg76538 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-11-28 15:50
I'm fine with disabling this feature in xmlrpclib.py, and possibly even in httplib.py. I'm not fine with "fixing" this behavior in socket.py -- the unittest coverage is unfortunately small and we have had plenty of trouble in this area in the past. It is there for a reason, even if that reason is hard to fathom and poorly documented. Fortunately in 3.0 it's gone (or, more likely, replaced with a different set of issues :-).
msg80160 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-01-19 11:58
Hi, I'm reawakening this because http://bugs.python.org/issue4879 needs to be ported to py3k. In py3k, a socket.fileobject() is still created with bufsize(0), although now the reasoning is different: def __init__(self, sock, debuglevel=0, strict=0, method=None): # XXX If the response includes a content-length header, we # need to make sure that the client doesn't read more than the # specified number of bytes. If it does, it will block until # the server times out and closes the connection. (The only # applies to HTTP/1.1 connections.) Since some clients access # self.fp directly rather than calling read(), this is a little # tricky. self.fp = sock.makefile("rb", 0) I think that this is just a translation of the old comment, i.e. a warning that some people may choose to call .recv() on the underlying socket. Now, this should be far more difficult now, with the newfangled IO library and all, and since the sock.makefile() is now a SocketIO object which inherits from RawIOBase and all that. It's tricky to excracth the socket to do .recv() on it. So, I don't think we need to fear buffering for readline() anymore. Or, is the comment about someone doing a HTTPResponse.fp.read() in stead of a HTTPResponse.read()? In that case, I don't see the problem. Of course, anyone reading N characters from a socket stream may cause blocking. My proposal is to remove the comment above and use default buffering for the fileobject. Any thoughts?
msg80895 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2009-02-01 00:33
unassigning, i don't have time to look at this one right now.
msg80945 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-02-02 15:58
I have looked at this for py3k. the behaviour of HTTPResponse.fp.read() is the same, wheter fp is buffered or not: a read() will read to EOF for HTTP/1.1, which means blocking indefinetely. So, read() is forbidden for HTTP/1.1. For fp.read(n), buffered IO won't attempt to read more than is on the stream, if n bytes are avalible (SocketIO.read(N) will return a<N and not block) so there is no reason not to use buffering.
msg81566 - (view)	Author: Kristján Valur Jónsson (kristjan.jonsson) *	Date: 2009-02-10 17:09
Issue 4879 has been resolved so that that HTTPResponse invokes socket.socket.makefile() with default buffering. see r69209. Since the problem stated in this defect has no bearing on 3.0 (there is no special hack for readline() in 3.0) I am closing this again.

History
Date	User	Action	Args
2022-04-11 14:56:41	admin	set	github: 48698
2009-02-10 17:09:24	kristjan.jonsson	set	status: open -> closed messages: + msg81566
2009-02-02 15:58:11	kristjan.jonsson	set	messages: + msg80945
2009-02-01 00:33:36	gregory.p.smith	set	assignee: gregory.p.smith -> messages: + msg80895
2009-01-19 21:56:45	ggenellina	set	nosy: + ggenellina
2009-01-19 11:58:13	kristjan.jonsson	set	messages: + msg80160 versions: + Python 3.1
2008-11-28 15:50:54	gvanrossum	set	messages: + msg76538
2008-11-28 09:35:43	kristjan.jonsson	set	nosy: + kristjan.jonsson messages: + msg76522
2008-11-28 04:34:16	gvanrossum	set	nosy: + gvanrossum messages: + msg76520
2008-11-27 20:57:10	gregory.p.smith	create