classification
Title: performance problem in socket._fileobject.read
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: CurtHagenlocher, akuchling, barry, benjamin.peterson, doko, draghuram, forest, gregory.p.smith, pitrou, schmir
Priority: release blocker Keywords: patch

Created on 2008-04-14 18:30 by CurtHagenlocher, last changed 2008-07-06 04:04 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
socket.py.diff CurtHagenlocher, 2008-04-15 05:24 Revised fix
socket-strio-gps01.patch.txt gregory.p.smith, 2008-04-21 18:00 patch to avoid long lifetime of large strings by using StringIO [against trunk r62440]
Messages (26)
msg65472 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008-04-14 18:30
First reported by Ralf Schmitt.
I haven't checked 2.6 or 3.0.
msg65473 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2008-04-14 18:33
The relevant python-dev thread is
http://mail.python.org/pipermail/python-dev/2008-April/078613.html
msg65485 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-14 21:49
One more time: the change is wrong. It should try to recv the maximum
not the minimum of size, buffer_size. If you using a buffering of 16
bytes it will otherwise call recv 256 times when you want to read 1024
bytes. this is wrong.
However there should be an upper limit, it doesn't make sense to read
10MB from socket in one recv call (imap bug).
msg65486 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-14 21:51
akuchling, I added you to the nosy list. hope that's ok.
msg65487 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-14 22:10
This is apparently the same issue as #2601.
msg65489 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-14 22:59
ahh. yes, same issue. should have taken a look at the bugtracker before,
would have saved me some time...
msg65499 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008-04-15 05:24
I've attached a much better patch as suggested by Guido
msg65500 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-15 05:42
This patch handles the case where the caller has specified the size
argument. When size is unspecified, it should be handled as if size was
infinite. By the formula from your patch, this should be

 recv_size = min(self.max_readsize, max(self._rbufsize, left))
  (== min(self.max_readsize, inf) == self.max_readsize)

and not the current code:
            if self._rbufsize <= 1:
                recv_size = self.default_bufsize
            else:
                recv_size = self._rbufsize
            while True:
                data = self._sock.recv(recv_size)

BTW,  I still think this max_readsize limit should be handled somewhere
deeper in the code. at least maybe in the _socketobject class.
msg65528 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008-04-15 20:45
At least in 2.5.2, there appear to be only two places in the standard 
library where an arbitrary and unchecked size is passed directly to 
_socketobject.recv.  _fileobject.read is one such place and the other 
is in logging/config.py.  In both cases, the pattern is something like

while len(data) < size:
    data = data + sock.recv(size - len(data))

Of course, the same pattern may exist in any number of instances of 
user code that performs a socket.recv, and all of these would be 
subject to the same memory thrashing problem that originally beset 
imaplib.

As such, I now agree with Ralf that the fix ultimately belongs in 
_socketobject.  However, deploying a fix for 2.5.2 is probably easier 
if it's in socket.py than if it requires a recompile.
msg65561 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-04-16 18:27
As A.M. Kuchling said in the other bug :), there is no need to fix 2.5.2
as the offending change is posterior to the release.
By the way, somewhat really ought to close either this bug or #2601,
discussing things in parallel in two different bug entries is too confusing.
msg65607 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-18 09:26
However it should be fixed in release25-maint. Can anyone install curt's
patch? It does what the original fix intended to do.
msg65664 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-21 18:00
Twisted fixed their problem for issue 1092502 by making recv()ed data
short lived by putting it into a StringIO for buffering.

I've attached a patch that does that for the socket module -and- gets
rid of any possibility of doing tiny size recvs in read() which is the
cause of the poor performance that this issue was filed about.

My patch also documents what _rbufsize is used for (readline) and makes
sane use of it with read.

When viewing the change, it is best to view it side by side next to the
old version rather than trying to comprehend it as a diff.

I believe this really fixes both issue 1092502 while at the same time
prevents introduction of a speed problem (this issue).
msg65669 - (view) Author: Ralf Schmitt (schmir) Date: 2008-04-22 06:02
(I think the twisted issue your talking about is:
http://twistedmatrix.com/trac/ticket/1079)

Your patch still contains this code:

                 recv_size = min(rbufsize, left)
                 data = self._sock.recv(recv_size)

This is IMHO wrong.  See my comments above.
msg65756 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-25 00:41
available for an easy side by side review here:

 http://codereview.appspot.com/212

Also, yes I think you're right Ralf.  With these changes I should be
able to return that to a max() within the while True: for sized reads
and things will work as desired.
msg66098 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-05-02 16:40
I fixed the min vs max use that Ralf noted and have submitted this as
r62627.  all tests pass for me but I believe it deserves a wider
audience and testing outside of just the test suite.

could those who reported the original problems (both the memory use
issue and the speed issue) run their tests with this applied?

... watching the buildbots ...

 http://codereview.appspot.com/621/diff/61/81
msg66267 - (view) Author: Ralf Schmitt (schmir) Date: 2008-05-05 12:47
I have a case which triggers the assert in readline:

  File "build/bdist.linux-x86_64/egg/setuptools/command/develop.py",
line 27, in run
  File "build/bdist.linux-x86_64/egg/setuptools/command/develop.py",
line 102, in install_for_development
  File
"build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line
519, in process_distribution
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 522, in resolve
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 758, in
best_match
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 770, in obtain
  File
"build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line
433, in easy_install
  File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
462, in fetch_distribution
  File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
303, in find_packages
  File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
617, in scan_url
  File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
198, in process_url
  File "/home/ralf/py26/lib/python2.6/socket.py", line 326, in read
data = self._sock.recv(rbufsize)
  File "/home/ralf/py26/lib/python2.6/httplib.py", line 512, in read
return self._read_chunked(amt)
  File "/home/ralf/py26/lib/python2.6/httplib.py", line 548, in
_read_chunked
line = self.fp.readline()
  File "/home/ralf/py26/lib/python2.6/socket.py", line 391, in readline
assert buf.tell() == 0
AssertionError

this happens when I run :

easy_install 'mwlib<0.6'
msg66269 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2008-05-05 13:59
Similar problem is reported in #2760.
msg66278 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-05-05 19:20
I'm glad i put that assert in...  The problem occurs due to a mixture of
fixed size reads followed by unbounded readlines on an unbuffered
_fileobject.  A case that the code currently doesn't handle.  I'm fixing it.
msg66294 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-05-05 21:54
The bug introduced in r62627 has been fixed in r62744.
msg66306 - (view) Author: Ralf Schmitt (schmir) Date: 2008-05-06 08:23
the above easy_install now works for me. thanks.
msg66400 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-05-08 03:46
Bumping this down to critical for the alpha release.  It should be
release blocker for the first beta.
msg66984 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-05-17 07:14
I'm going to let the committed change bake in Python trunk for a beta
release or two before backporting it to release25-maint.
msg67945 - (view) Author: Ralf Schmitt (schmir) Date: 2008-06-11 09:36
Can we get the fix for release25-maint? It will not get worse than the
current state is.
msg68191 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-06-13 23:46
If it's fixed in 2.6 and 3.0, it shouldn't be release blocker anymore,
now should it?
msg68192 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-13 23:46
Well, it's a release blocker for 2.5.3.
msg69315 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-06 04:04
committed to release25-maint in r64754.
History
Date User Action Args
2008-07-06 04:04:51gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg69315
2008-06-13 23:46:55benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg68192
2008-06-13 23:46:03pitrousetmessages: + msg68191
2008-06-11 09:36:27schmirsetmessages: + msg67945
2008-06-11 05:44:29gregory.p.smithsetversions: - Python 2.6
2008-05-17 07:14:54gregory.p.smithsetmessages: + msg66984
2008-05-14 22:42:11georg.brandlsetpriority: critical -> release blocker
2008-05-08 03:46:36barrysetpriority: release blocker -> critical
nosy: + barry
messages: + msg66400
2008-05-06 08:23:31schmirsetmessages: + msg66306
2008-05-05 21:54:34gregory.p.smithsetmessages: + msg66294
2008-05-05 19:20:59gregory.p.smithsetmessages: + msg66278
2008-05-05 18:35:43gregory.p.smithlinkissue2760 superseder
2008-05-05 18:35:43gregory.p.smithlinkissue2760 dependencies
2008-05-05 13:59:50draghuramsetmessages: + msg66269
2008-05-05 12:47:27schmirsetmessages: + msg66267
2008-05-02 17:08:13gregory.p.smithsetpriority: critical -> release blocker
2008-05-02 16:40:05gregory.p.smithsetmessages: + msg66098
2008-04-30 08:56:36dokosetnosy: + doko
2008-04-30 05:56:00mhammondlinkissue2601 superseder
2008-04-25 00:41:13gregory.p.smithsetmessages: + msg65756
2008-04-22 06:02:36schmirsetmessages: + msg65669
2008-04-21 22:32:55forestsetnosy: + forest
2008-04-21 18:00:53gregory.p.smithsetfiles: + socket-strio-gps01.patch.txt
assignee: gregory.p.smith
messages: + msg65664
nosy: + gregory.p.smith
2008-04-18 09:26:23schmirsetmessages: + msg65607
2008-04-16 18:27:57pitrousetmessages: + msg65561
2008-04-15 20:45:15CurtHagenlochersetmessages: + msg65528
title: socket._fileobject.read(n) should ignore _rbufsize when 1 -> performance problem in socket._fileobject.read
2008-04-15 05:42:16schmirsetmessages: + msg65500
2008-04-15 05:24:32CurtHagenlochersetfiles: - socket.py.diff
2008-04-15 05:24:23CurtHagenlochersetfiles: + socket.py.diff
messages: + msg65499
2008-04-14 22:59:39schmirsetmessages: + msg65489
2008-04-14 22:10:24pitrousetnosy: + pitrou
messages: + msg65487
2008-04-14 21:52:18schmirsetnosy: + akuchling
2008-04-14 21:51:30schmirsetmessages: + msg65486
2008-04-14 21:51:03gvanrossumsetpriority: critical
2008-04-14 21:50:00schmirsetversions: + Python 2.6
2008-04-14 21:49:44schmirsetmessages: + msg65485
2008-04-14 21:42:00schmirsetnosy: + schmir
2008-04-14 18:33:25draghuramsetnosy: + draghuram
messages: + msg65473
2008-04-14 18:30:16CurtHagenlochercreate