classification
Title: socket.socket.recv broken (unbearably slow)
Type: Stage:
Components: Library (Lib) Versions: Python 2.4, Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Thorben, ggenellina, giampaolo.rodola, kostja, pitrou, samslists, thorben
Priority: normal Keywords:

Created on 2008-09-03 21:58 by thorben, last changed 2010-12-31 11:58 by kostja. This issue is now closed.

Files
File name Uploaded Description Edit
socket_bug.tar.bz2 thorben, 2008-09-03 21:58 file containing all relevant files to (hopefully) reproduce this bug
Messages (13)
msg72424 - (view) Author: Thorben Krueger (thorben) Date: 2008-09-03 21:58
Under Linux 2.6.24-1-amd64 as well as 2.6.26 (x86-32), Python versions
2.5.2 and 2.4.4 (on both machines), there is a huge discrepancy between
socket read latencies, depending on "code context".

Some detail:

For a university project, I wrote a python client for a query-server. A
reference implementation existed in Perl, so the job was pretty straight
forward. However, for reasons unknown to me, the Python implementation
was by several orders of magnitude slower than the reference
implementation. To track down the issue, I wrote a dummy version of the
query-server in Python, where the problem persisted. I then stripped
down both client and server to the minimal functionality and still the
problem persisted. 
I wrote a demo inside a single file using socketpair() to post here, but
the issue was not reproducible.

Finally, I stripped down the original client/server to a postable level
and ran a profiler on a testcase.

Here is the gist of it:

Sending 500 packets @ 2 tokens each (500 very short lists) takes 40
seconds on average.

In detail:

14508 function calls in 39.980 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1500   39.834    0.027   39.834    0.027 {method 'recv' of
'_socket.socket' objects}
     1500    0.024    0.000   39.877    0.027 Client.py:16(read_int)
     1500    0.020    0.000    0.020    0.000 <string>:1(sendall)
     1500    0.018    0.000    0.048    0.000 Client.py:8(send_int)
      500    0.016    0.000   39.903    0.080 Client.py:19(read_int_list)
     1500    0.015    0.000    0.019    0.000 struct.py:77(unpack)
      500    0.010    0.000    0.060    0.000 Client.py:11(send_int_list)
     1500    0.010    0.000    0.010    0.000 struct.py:54(pack)
        1    0.009    0.009   39.980   39.980 dummytest.py:12(bench)
     1000    0.007    0.000    0.007    0.000 {method 'insert' of 'list'
objects}
     1001    0.006    0.000    0.006    0.000 {range}
      500    0.005    0.000   39.968    0.080 Client.py:28(spam)
     1500    0.005    0.000    0.005    0.000 {method 'unpack' of
'Struct' objects}
      501    0.002    0.000    0.002    0.000 {len}


Here is the same for 1 packet @ 50000 tokens (1 very long list), taking
below 10 seconds on average.

8.51872587204
         400018 function calls in 8.519 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50001    2.980    0.000    2.980    0.000 {method 'recv' of
'_socket.socket' objects}
    50000    2.980    0.000    2.980    0.000 {method 'insert' of 'list'
objects}
    50001    0.993    0.000    0.993    0.000 <string>:1(sendall)
    50001    0.410    0.000    1.558    0.000 Client.py:8(send_int)
    50001    0.334    0.000    3.581    0.000 Client.py:16(read_int)
        1    0.247    0.247    6.809    6.809 Client.py:19(read_int_list)
    50001    0.191    0.000    0.266    0.000 struct.py:77(unpack)
    50001    0.154    0.000    0.154    0.000 struct.py:54(pack)
        1    0.146    0.146    1.703    1.703 Client.py:11(send_int_list)
    50001    0.075    0.000    0.075    0.000 {method 'unpack' of
'Struct' objects}

I don't get the reason for the huge speed discrepancy. I include all
source code files for reproducing the issue.

Notably, the original software (non stripped-down version) runs without
these issues using a OS X Python version. Details may follow, I don't
own a Mac but know people who do.

Also note that I can't get the server to shut down properly (the thread
does not terminate). You have to kill the process manually and wait for
the port to be freed by the kernel. Maybe this is easily fixable but at
least I don't know how.

The attached archive contains all source code plus README and the
socketpair() version.
msg72947 - (view) Author: Thorben Krueger (thorben) Date: 2008-09-10 07:25
As promised, here is the profiler output for a Mac (thanks Stefan). The
problem does NOT occur here, which should help greatly in tracking down
the cause in the socket library. Anyone into this?

$ python runme.py
500 packets @ 2 tokens each (500 very short lists)
0.0814669132233
        14508 function calls in 0.082 CPU seconds

  Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1500    0.048    0.000    0.048    0.000 {method 'recv' of
'_socket.socket' objects}
    1500    0.015    0.000    0.015    0.000 <string>:1(sendall)
    1500    0.003    0.000    0.021    0.000 Client.py:8(send_int)
    1500    0.003    0.000    0.053    0.000 Client.py:16(read_int)
     500    0.003    0.000    0.057    0.000 Client.py:19(read_int_list)
    1500    0.002    0.000    0.002    0.000 struct.py:77(unpack)
    1500    0.002    0.000    0.002    0.000 struct.py:54(pack)
     500    0.001    0.000    0.022    0.000 Client.py:11(send_int_list)
       1    0.001    0.001    0.082    0.082 runme.py:12(bench)
    1000    0.001    0.000    0.001    0.000 {method 'insert' of 'list'
objects}
    1001    0.001    0.000    0.001    0.000 {range}
     500    0.001    0.000    0.080    0.000 Client.py:28(spam)
    1500    0.000    0.000    0.000    0.000 {method 'unpack' of
'Struct' objects}
     501    0.000    0.000    0.000    0.000 {len}


1 packet @ 50000 tokens (1 very long list)
5.20953297615
        400018 function calls in 5.210 CPU seconds

  Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   50001    2.267    0.000    2.267    0.000 <string>:1(sendall)
   50001    1.263    0.000    1.263    0.000 {method 'recv' of
'_socket.socket' objects}
   50000    1.112    0.000    1.112    0.000 {method 'insert' of 'list'
objects}
   50001    0.137    0.000    1.490    0.000 Client.py:16(read_int)
   50001    0.134    0.000    2.478    0.000 Client.py:8(send_int)
       1    0.082    0.082    2.685    2.685 Client.py:19(read_int_list)
   50001    0.077    0.000    0.077    0.000 struct.py:54(pack)
   50001    0.073    0.000    0.089    0.000 struct.py:77(unpack)
       1    0.044    0.044    2.522    2.522 Client.py:11(send_int_list)
   50001    0.016    0.000    0.016    0.000 {method 'unpack' of
'Struct' objects}
msg73035 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-11 16:02
Thorben, is the problem still there if you use fork() rather than
launching a separate thread for the server?

The implementation of the recv() method is straightforward and I don't
see anything that could cause a huge latency there, it's just a simple
wrapper over the C library's recv() function. The waiting certainly
occurs inside the OS rather than inside the Python interpreter.
msg73038 - (view) Author: Thorben (Thorben) Date: 2008-09-11 16:13
The problem exists even if the server part is replaced by a server
written in C. I only wrote up the dummy server in python to be able to
provide a testcase.

The C server works with reasonable speed when connected to a client
written in perl by the way. My employer is quite disappointed with
Python's performance... (He provided the profiler data for the Mac by
the way) I almost wish that I did something wrong, but this does not
seem to be the case.

Nevertheless, I will try out your suggestion.

Thanks for replying,

Thorben

2008/9/11 Antoine Pitrou <report@bugs.python.org>:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Thorben, is the problem still there if you use fork() rather than
> launching a separate thread for the server?
>
> The implementation of the recv() method is straightforward and I don't
> see anything that could cause a huge latency there, it's just a simple
> wrapper over the C library's recv() function. The waiting certainly
> occurs inside the OS rather than inside the Python interpreter.
>
> ----------
> nosy: +pitrou
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3766>
> _______________________________________
>
msg73040 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-11 16:25
You can use setsockopt() to set the TCP_NODELAY flag and see if that
improves things; sending thousands of 4-bytes packets isn't exactly the
expected utilization of TCP.

As I said, socket.recv() is just a thin wrapper above the same-named C
library function. Your code expects the kernel's TCP stack to send
things as soon as you ask it to, but TCP is a high-level stream protocol
and implementations usually have sophisticated algorithms in order to
minimize the number of individual packets sent over the wire.

By the way, if you want to build network applications (clients and
servers), I suggest you take a look at the Twisted framework. It will
free you from many low-level issues, although it won't prevent you from
shooting yourself in the foot :-)
msg73043 - (view) Author: Thorben (Thorben) Date: 2008-09-11 17:50
2008/9/11 Antoine Pitrou <report@bugs.python.org>:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> You can use setsockopt() to set the TCP_NODELAY flag and see if that
> improves things; sending thousands of 4-bytes packets isn't exactly the
> expected utilization of TCP.

You are obviously right, but IIRC, I mentioned that this is simply a
dummy testcase...

now thats interesting:
adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
the delay by half. It still is extremely high but it's a start.

Do you think I could get that value down much further? I don't know
much about TCP...

> As I said, socket.recv() is just a thin wrapper above the same-named C
> library function. Your code expects the kernel's TCP stack to send
> things as soon as you ask it to, but TCP is a high-level stream protocol
> and implementations usually have sophisticated algorithms in order to
> minimize the number of individual packets sent over the wire.

Would be interesting to examine the differences between the Perl
wrapper and the Python wrapper to figure out why Perl "does the right
thing" in this case and Python doesn't.

> By the way, if you want to build network applications (clients and
> servers), I suggest you take a look at the Twisted framework. It will
> free you from many low-level issues, although it won't prevent you from
> shooting yourself in the foot :-)

Thanks, I'll make sure to try it out.

> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3766>
> _______________________________________
>
msg73048 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-11 18:32
> now thats interesting:
> adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
> the delay by half. It still is extremely high but it's a start.

Did you do it on both the client and server sockets?

> Would be interesting to examine the differences between the Perl
> wrapper and the Python wrapper to figure out why Perl "does the right
> thing" in this case and Python doesn't.

Perhaps the Perl wrapper is less thin as the Python one. In any case,
it's by design if the Python socket wrapper doesn't try to be "smart":
the intent is to provide an access to the C API and let people do what
they want with it. Smart things are relegated to higher-level modules or
libraries.
msg73050 - (view) Author: Thorben (Thorben) Date: 2008-09-11 18:52
2008/9/11 Antoine Pitrou <report@bugs.python.org>:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> now thats interesting:
>> adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
>> the delay by half. It still is extremely high but it's a start.
>
> Did you do it on both the client and server sockets?

Yes, obviously. Although adding it to the client socket did make no
difference after I had already done so for the server. Still
communication is too slow by orders of magnitude. (Sorry for pointing
this out again)

>> Would be interesting to examine the differences between the Perl
>> wrapper and the Python wrapper to figure out why Perl "does the right
>> thing" in this case and Python doesn't.
>
> Perhaps the Perl wrapper is less thin as the Python one. In any case,
> it's by design if the Python socket wrapper doesn't try to be "smart":
> the intent is to provide an access to the C API and let people do what
> they want with it. Smart things are relegated to higher-level modules or
> libraries.

I would greatly appreciate any help on the subject. How do *BSD
sockets differ from Linux sockets and what do I do to make things
faster. I think this might be where the real issue is. Low level
networking voodoo. Do you think twisted might help me there?

> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3766>
> _______________________________________
>
msg73076 - (view) Author: Gabriel Genellina (ggenellina) Date: 2008-09-12 03:11
I've tested it on Windows XP. MSG_WAITALL is not supported, but I 
replaced it using a while loop. I didn't notice any extraneous delay.

500 packets @ 2 tokens each (500 very short lists)
0.140999794006
         16008 function calls in 0.146 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1500    0.036    0.000    0.036    0.000 {method 'recv' of 
'_socket.socket'
 objects}
     1500    0.033    0.000    0.033    0.000 <string>:1(sendall)
     1500    0.016    0.000    0.065    0.000 Client.py:15(read_int)
     1500    0.015    0.000    0.053    0.000 Client.py:7(send_int)
      500    0.009    0.000    0.077    0.000 
Client.py:22(read_int_list)
      500    0.007    0.000    0.060    0.000 
Client.py:10(send_int_list)
     1500    0.007    0.000    0.010    0.000 struct.py:77(unpack)
     1500    0.005    0.000    0.005    0.000 struct.py:54(pack)
      500    0.004    0.000    0.141    0.000 Client.py:31(spam)
     2001    0.004    0.000    0.004    0.000 {len}
        1    0.003    0.003    0.146    0.146 runme.py:11(bench)
     1500    0.003    0.000    0.003    0.000 {method 'unpack' of 
'Struct' objec
ts}
     1001    0.003    0.000    0.003    0.000 {range}
     1000    0.002    0.000    0.002    0.000 {method 'append' of 
'list' objects
}
        1    0.000    0.000    0.000    0.000 struct.py:35(_compile)
        2    0.000    0.000    0.000    0.000 {time.time}
        1    0.000    0.000    0.146    0.146 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of 
'_lsprof.Prof
iler' objects}


None
================================================================================


1 packet @ 50000 tokens (1 very long list)
4.89100003242
         450019 function calls in 4.893 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50001    1.212    0.000    1.212    0.000 <string>:1(sendall)
    50001    1.062    0.000    1.062    0.000 {method 'recv' of 
'_socket.socket'
 objects}
    50001    0.594    0.000    2.282    0.000 Client.py:15(read_int)
    50001    0.517    0.000    1.982    0.000 Client.py:7(send_int)
        1    0.354    0.354    2.732    2.732 
Client.py:22(read_int_list)
    50001    0.335    0.000    0.524    0.000 struct.py:77(unpack)
    50001    0.253    0.000    0.253    0.000 struct.py:54(pack)
    50001    0.189    0.000    0.189    0.000 {method 'unpack' of 
'Struct' objec
ts}
        1    0.176    0.176    2.158    2.158 
Client.py:10(send_int_list)
    50002    0.102    0.000    0.102    0.000 {len}
    50000    0.097    0.000    0.097    0.000 {method 'append' of 
'list' objects
}
        2    0.002    0.001    0.002    0.001 {range}
        1    0.002    0.002    4.893    4.893 runme.py:19(bench2)
        1    0.000    0.000    4.890    4.890 Client.py:31(spam)
        2    0.000    0.000    0.000    0.000 {time.time}
        1    0.000    0.000    4.893    4.893 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of 
'_lsprof.Prof
iler' objects}
msg73078 - (view) Author: Thorben (Thorben) Date: 2008-09-12 08:01
That's more like it. Now I'd like to see the same behavior on Linux...

2008/9/12 Gabriel Genellina <report@bugs.python.org>:
>
> Gabriel Genellina <gagsl-py2@yahoo.com.ar> added the comment:
>
> I've tested it on Windows XP. MSG_WAITALL is not supported, but I
> replaced it using a while loop. I didn't notice any extraneous delay.
>
> 500 packets @ 2 tokens each (500 very short lists)
> 0.140999794006
>         16008 function calls in 0.146 CPU seconds
>
>   Ordered by: internal time
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>     1500    0.036    0.000    0.036    0.000 {method 'recv' of
> '_socket.socket'
>  objects}
>     1500    0.033    0.000    0.033    0.000 <string>:1(sendall)
>     1500    0.016    0.000    0.065    0.000 Client.py:15(read_int)
>     1500    0.015    0.000    0.053    0.000 Client.py:7(send_int)
>      500    0.009    0.000    0.077    0.000
> Client.py:22(read_int_list)
>      500    0.007    0.000    0.060    0.000
> Client.py:10(send_int_list)
>     1500    0.007    0.000    0.010    0.000 struct.py:77(unpack)
>     1500    0.005    0.000    0.005    0.000 struct.py:54(pack)
>      500    0.004    0.000    0.141    0.000 Client.py:31(spam)
>     2001    0.004    0.000    0.004    0.000 {len}
>        1    0.003    0.003    0.146    0.146 runme.py:11(bench)
>     1500    0.003    0.000    0.003    0.000 {method 'unpack' of
> 'Struct' objec
> ts}
>     1001    0.003    0.000    0.003    0.000 {range}
>     1000    0.002    0.000    0.002    0.000 {method 'append' of
> 'list' objects
> }
>        1    0.000    0.000    0.000    0.000 struct.py:35(_compile)
>        2    0.000    0.000    0.000    0.000 {time.time}
>        1    0.000    0.000    0.146    0.146 <string>:1(<module>)
>        1    0.000    0.000    0.000    0.000 {method 'disable' of
> '_lsprof.Prof
> iler' objects}
>
>
> None
> ================================================================================
>
>
> 1 packet @ 50000 tokens (1 very long list)
> 4.89100003242
>         450019 function calls in 4.893 CPU seconds
>
>   Ordered by: internal time
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>    50001    1.212    0.000    1.212    0.000 <string>:1(sendall)
>    50001    1.062    0.000    1.062    0.000 {method 'recv' of
> '_socket.socket'
>  objects}
>    50001    0.594    0.000    2.282    0.000 Client.py:15(read_int)
>    50001    0.517    0.000    1.982    0.000 Client.py:7(send_int)
>        1    0.354    0.354    2.732    2.732
> Client.py:22(read_int_list)
>    50001    0.335    0.000    0.524    0.000 struct.py:77(unpack)
>    50001    0.253    0.000    0.253    0.000 struct.py:54(pack)
>    50001    0.189    0.000    0.189    0.000 {method 'unpack' of
> 'Struct' objec
> ts}
>        1    0.176    0.176    2.158    2.158
> Client.py:10(send_int_list)
>    50002    0.102    0.000    0.102    0.000 {len}
>    50000    0.097    0.000    0.097    0.000 {method 'append' of
> 'list' objects
> }
>        2    0.002    0.001    0.002    0.001 {range}
>        1    0.002    0.002    4.893    4.893 runme.py:19(bench2)
>        1    0.000    0.000    4.890    4.890 Client.py:31(spam)
>        2    0.000    0.000    0.000    0.000 {time.time}
>        1    0.000    0.000    4.893    4.893 <string>:1(<module>)
>        1    0.000    0.000    0.000    0.000 {method 'disable' of
> '_lsprof.Prof
> iler' objects}
>
> ----------
> nosy: +gagenellina
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue3766>
> _______________________________________
>
msg73176 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-13 11:31
> Yes, obviously. Although adding it to the client socket did make no
> difference after I had already done so for the server. Still
> communication is too slow by orders of magnitude. (Sorry for pointing
> this out again)

Well, if this precise use case is really important for you, I suggest
using Wireshark (or any other packet analyzer) to see what happens in
terms of latency between packets over the wire.

Something else: try replacing "localhost" with "127.0.0.1", perhaps your
DNS resolution eats a lot of time.

> I would greatly appreciate any help on the subject. How do *BSD
> sockets differ from Linux sockets and what do I do to make things
> faster.

I don't know, but I suspect the difference is more in the TCP stack
implementation than in the sockets layer.

In any case, I'm gonna close this bug, as this is very likely not a
Python problem. Please ask further questions on comp.lang.python, lots
of people there may help you :)
msg81181 - (view) Author: Sam (samslists) Date: 2009-02-05 01:39
I know this bug is closed, but I too am experiencing it under Linux 
2.6.24-22 and Python 2.5.2.

I'm using urllib2, and it's just spending an obscene amount of cpu time 
in {method 'recv' of '_socket.socket' objects}....

Anyone have any ideas?

Would switching to httplib2 help?
msg124963 - (view) Author: Konstantin Osipov (kostja) Date: 2010-12-31 11:58
I was able to observe the same issue:

I'm using Python 2.6.5 on Ubuntu 10.0.4 LTS. My system is 64 bit Intel Core I7, Quad core, Linux kernel 2.6.32-generic x86_64, Ubuntu EGLIBC 2.11.1-0ubuntu7.5.

A simple client TCP socket, which sends 35 bytes over to a server on localhost and receives 20 bytes in response, produces only 22 RPS. An identical application written in C gives me 7000 RPS. TCP_NODELAY is on on the server side. Turning on TCP_NODELAY on the client gives me ~500 RPS in Python (which I'm satisfied with, 'cause I think I then hit other bottlenecks). 
Still, such low performance on by default can be surprising and hard to debug.
History
Date User Action Args
2010-12-31 11:58:09kostjasetnosy: + kostja
messages: + msg124963
2009-02-05 01:39:06samslistssetnosy: + samslists
messages: + msg81181
2008-09-13 11:31:45pitrousetstatus: open -> closed
resolution: not a bug
2008-09-13 11:31:18pitrousetmessages: + msg73176
2008-09-12 08:01:19Thorbensetmessages: + msg73078
2008-09-12 03:11:19ggenellinasetnosy: + ggenellina
messages: + msg73076
2008-09-11 18:52:17Thorbensetmessages: + msg73050
2008-09-11 18:32:22pitrousetmessages: + msg73048
2008-09-11 17:50:38Thorbensetmessages: + msg73043
2008-09-11 16:25:08pitrousetmessages: + msg73040
2008-09-11 16:13:09Thorbensetnosy: + Thorben
messages: + msg73038
2008-09-11 16:02:16pitrousetnosy: + pitrou
messages: + msg73035
2008-09-11 15:14:56giampaolo.rodolasetnosy: + giampaolo.rodola
2008-09-10 07:26:01thorbensetmessages: + msg72947
2008-09-03 21:58:47thorbencreate