Issue3766
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-09-03 21:58 by thorben, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
socket_bug.tar.bz2 | thorben, 2008-09-03 21:58 | file containing all relevant files to (hopefully) reproduce this bug |
Messages (13) | |||
---|---|---|---|
msg72424 - (view) | Author: Thorben Krueger (thorben) | Date: 2008-09-03 21:58 | |
Under Linux 2.6.24-1-amd64 as well as 2.6.26 (x86-32), Python versions 2.5.2 and 2.4.4 (on both machines), there is a huge discrepancy between socket read latencies, depending on "code context". Some detail: For a university project, I wrote a python client for a query-server. A reference implementation existed in Perl, so the job was pretty straight forward. However, for reasons unknown to me, the Python implementation was by several orders of magnitude slower than the reference implementation. To track down the issue, I wrote a dummy version of the query-server in Python, where the problem persisted. I then stripped down both client and server to the minimal functionality and still the problem persisted. I wrote a demo inside a single file using socketpair() to post here, but the issue was not reproducible. Finally, I stripped down the original client/server to a postable level and ran a profiler on a testcase. Here is the gist of it: Sending 500 packets @ 2 tokens each (500 very short lists) takes 40 seconds on average. In detail: 14508 function calls in 39.980 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1500 39.834 0.027 39.834 0.027 {method 'recv' of '_socket.socket' objects} 1500 0.024 0.000 39.877 0.027 Client.py:16(read_int) 1500 0.020 0.000 0.020 0.000 <string>:1(sendall) 1500 0.018 0.000 0.048 0.000 Client.py:8(send_int) 500 0.016 0.000 39.903 0.080 Client.py:19(read_int_list) 1500 0.015 0.000 0.019 0.000 struct.py:77(unpack) 500 0.010 0.000 0.060 0.000 Client.py:11(send_int_list) 1500 0.010 0.000 0.010 0.000 struct.py:54(pack) 1 0.009 0.009 39.980 39.980 dummytest.py:12(bench) 1000 0.007 0.000 0.007 0.000 {method 'insert' of 'list' objects} 1001 0.006 0.000 0.006 0.000 {range} 500 0.005 0.000 39.968 0.080 Client.py:28(spam) 1500 0.005 0.000 0.005 0.000 {method 'unpack' of 'Struct' objects} 501 0.002 0.000 0.002 0.000 {len} Here is the same for 1 packet @ 50000 tokens (1 very long list), taking below 10 seconds on average. 8.51872587204 400018 function calls in 8.519 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 50001 2.980 0.000 2.980 0.000 {method 'recv' of '_socket.socket' objects} 50000 2.980 0.000 2.980 0.000 {method 'insert' of 'list' objects} 50001 0.993 0.000 0.993 0.000 <string>:1(sendall) 50001 0.410 0.000 1.558 0.000 Client.py:8(send_int) 50001 0.334 0.000 3.581 0.000 Client.py:16(read_int) 1 0.247 0.247 6.809 6.809 Client.py:19(read_int_list) 50001 0.191 0.000 0.266 0.000 struct.py:77(unpack) 50001 0.154 0.000 0.154 0.000 struct.py:54(pack) 1 0.146 0.146 1.703 1.703 Client.py:11(send_int_list) 50001 0.075 0.000 0.075 0.000 {method 'unpack' of 'Struct' objects} I don't get the reason for the huge speed discrepancy. I include all source code files for reproducing the issue. Notably, the original software (non stripped-down version) runs without these issues using a OS X Python version. Details may follow, I don't own a Mac but know people who do. Also note that I can't get the server to shut down properly (the thread does not terminate). You have to kill the process manually and wait for the port to be freed by the kernel. Maybe this is easily fixable but at least I don't know how. The attached archive contains all source code plus README and the socketpair() version. |
|||
msg72947 - (view) | Author: Thorben Krueger (thorben) | Date: 2008-09-10 07:25 | |
As promised, here is the profiler output for a Mac (thanks Stefan). The problem does NOT occur here, which should help greatly in tracking down the cause in the socket library. Anyone into this? $ python runme.py 500 packets @ 2 tokens each (500 very short lists) 0.0814669132233 14508 function calls in 0.082 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1500 0.048 0.000 0.048 0.000 {method 'recv' of '_socket.socket' objects} 1500 0.015 0.000 0.015 0.000 <string>:1(sendall) 1500 0.003 0.000 0.021 0.000 Client.py:8(send_int) 1500 0.003 0.000 0.053 0.000 Client.py:16(read_int) 500 0.003 0.000 0.057 0.000 Client.py:19(read_int_list) 1500 0.002 0.000 0.002 0.000 struct.py:77(unpack) 1500 0.002 0.000 0.002 0.000 struct.py:54(pack) 500 0.001 0.000 0.022 0.000 Client.py:11(send_int_list) 1 0.001 0.001 0.082 0.082 runme.py:12(bench) 1000 0.001 0.000 0.001 0.000 {method 'insert' of 'list' objects} 1001 0.001 0.000 0.001 0.000 {range} 500 0.001 0.000 0.080 0.000 Client.py:28(spam) 1500 0.000 0.000 0.000 0.000 {method 'unpack' of 'Struct' objects} 501 0.000 0.000 0.000 0.000 {len} 1 packet @ 50000 tokens (1 very long list) 5.20953297615 400018 function calls in 5.210 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 50001 2.267 0.000 2.267 0.000 <string>:1(sendall) 50001 1.263 0.000 1.263 0.000 {method 'recv' of '_socket.socket' objects} 50000 1.112 0.000 1.112 0.000 {method 'insert' of 'list' objects} 50001 0.137 0.000 1.490 0.000 Client.py:16(read_int) 50001 0.134 0.000 2.478 0.000 Client.py:8(send_int) 1 0.082 0.082 2.685 2.685 Client.py:19(read_int_list) 50001 0.077 0.000 0.077 0.000 struct.py:54(pack) 50001 0.073 0.000 0.089 0.000 struct.py:77(unpack) 1 0.044 0.044 2.522 2.522 Client.py:11(send_int_list) 50001 0.016 0.000 0.016 0.000 {method 'unpack' of 'Struct' objects} |
|||
msg73035 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-09-11 16:02 | |
Thorben, is the problem still there if you use fork() rather than launching a separate thread for the server? The implementation of the recv() method is straightforward and I don't see anything that could cause a huge latency there, it's just a simple wrapper over the C library's recv() function. The waiting certainly occurs inside the OS rather than inside the Python interpreter. |
|||
msg73038 - (view) | Author: Thorben (Thorben) | Date: 2008-09-11 16:13 | |
The problem exists even if the server part is replaced by a server written in C. I only wrote up the dummy server in python to be able to provide a testcase. The C server works with reasonable speed when connected to a client written in perl by the way. My employer is quite disappointed with Python's performance... (He provided the profiler data for the Mac by the way) I almost wish that I did something wrong, but this does not seem to be the case. Nevertheless, I will try out your suggestion. Thanks for replying, Thorben 2008/9/11 Antoine Pitrou <report@bugs.python.org>: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > Thorben, is the problem still there if you use fork() rather than > launching a separate thread for the server? > > The implementation of the recv() method is straightforward and I don't > see anything that could cause a huge latency there, it's just a simple > wrapper over the C library's recv() function. The waiting certainly > occurs inside the OS rather than inside the Python interpreter. > > ---------- > nosy: +pitrou > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue3766> > _______________________________________ > |
|||
msg73040 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-09-11 16:25 | |
You can use setsockopt() to set the TCP_NODELAY flag and see if that improves things; sending thousands of 4-bytes packets isn't exactly the expected utilization of TCP. As I said, socket.recv() is just a thin wrapper above the same-named C library function. Your code expects the kernel's TCP stack to send things as soon as you ask it to, but TCP is a high-level stream protocol and implementations usually have sophisticated algorithms in order to minimize the number of individual packets sent over the wire. By the way, if you want to build network applications (clients and servers), I suggest you take a look at the Twisted framework. It will free you from many low-level issues, although it won't prevent you from shooting yourself in the foot :-) |
|||
msg73043 - (view) | Author: Thorben (Thorben) | Date: 2008-09-11 17:50 | |
2008/9/11 Antoine Pitrou <report@bugs.python.org>: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > You can use setsockopt() to set the TCP_NODELAY flag and see if that > improves things; sending thousands of 4-bytes packets isn't exactly the > expected utilization of TCP. You are obviously right, but IIRC, I mentioned that this is simply a dummy testcase... now thats interesting: adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased the delay by half. It still is extremely high but it's a start. Do you think I could get that value down much further? I don't know much about TCP... > As I said, socket.recv() is just a thin wrapper above the same-named C > library function. Your code expects the kernel's TCP stack to send > things as soon as you ask it to, but TCP is a high-level stream protocol > and implementations usually have sophisticated algorithms in order to > minimize the number of individual packets sent over the wire. Would be interesting to examine the differences between the Perl wrapper and the Python wrapper to figure out why Perl "does the right thing" in this case and Python doesn't. > By the way, if you want to build network applications (clients and > servers), I suggest you take a look at the Twisted framework. It will > free you from many low-level issues, although it won't prevent you from > shooting yourself in the foot :-) Thanks, I'll make sure to try it out. > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue3766> > _______________________________________ > |
|||
msg73048 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-09-11 18:32 | |
> now thats interesting: > adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased > the delay by half. It still is extremely high but it's a start. Did you do it on both the client and server sockets? > Would be interesting to examine the differences between the Perl > wrapper and the Python wrapper to figure out why Perl "does the right > thing" in this case and Python doesn't. Perhaps the Perl wrapper is less thin as the Python one. In any case, it's by design if the Python socket wrapper doesn't try to be "smart": the intent is to provide an access to the C API and let people do what they want with it. Smart things are relegated to higher-level modules or libraries. |
|||
msg73050 - (view) | Author: Thorben (Thorben) | Date: 2008-09-11 18:52 | |
2008/9/11 Antoine Pitrou <report@bugs.python.org>: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> now thats interesting: >> adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased >> the delay by half. It still is extremely high but it's a start. > > Did you do it on both the client and server sockets? Yes, obviously. Although adding it to the client socket did make no difference after I had already done so for the server. Still communication is too slow by orders of magnitude. (Sorry for pointing this out again) >> Would be interesting to examine the differences between the Perl >> wrapper and the Python wrapper to figure out why Perl "does the right >> thing" in this case and Python doesn't. > > Perhaps the Perl wrapper is less thin as the Python one. In any case, > it's by design if the Python socket wrapper doesn't try to be "smart": > the intent is to provide an access to the C API and let people do what > they want with it. Smart things are relegated to higher-level modules or > libraries. I would greatly appreciate any help on the subject. How do *BSD sockets differ from Linux sockets and what do I do to make things faster. I think this might be where the real issue is. Low level networking voodoo. Do you think twisted might help me there? > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue3766> > _______________________________________ > |
|||
msg73076 - (view) | Author: Gabriel Genellina (ggenellina) | Date: 2008-09-12 03:11 | |
I've tested it on Windows XP. MSG_WAITALL is not supported, but I replaced it using a while loop. I didn't notice any extraneous delay. 500 packets @ 2 tokens each (500 very short lists) 0.140999794006 16008 function calls in 0.146 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1500 0.036 0.000 0.036 0.000 {method 'recv' of '_socket.socket' objects} 1500 0.033 0.000 0.033 0.000 <string>:1(sendall) 1500 0.016 0.000 0.065 0.000 Client.py:15(read_int) 1500 0.015 0.000 0.053 0.000 Client.py:7(send_int) 500 0.009 0.000 0.077 0.000 Client.py:22(read_int_list) 500 0.007 0.000 0.060 0.000 Client.py:10(send_int_list) 1500 0.007 0.000 0.010 0.000 struct.py:77(unpack) 1500 0.005 0.000 0.005 0.000 struct.py:54(pack) 500 0.004 0.000 0.141 0.000 Client.py:31(spam) 2001 0.004 0.000 0.004 0.000 {len} 1 0.003 0.003 0.146 0.146 runme.py:11(bench) 1500 0.003 0.000 0.003 0.000 {method 'unpack' of 'Struct' objec ts} 1001 0.003 0.000 0.003 0.000 {range} 1000 0.002 0.000 0.002 0.000 {method 'append' of 'list' objects } 1 0.000 0.000 0.000 0.000 struct.py:35(_compile) 2 0.000 0.000 0.000 0.000 {time.time} 1 0.000 0.000 0.146 0.146 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof iler' objects} None ================================================================================ 1 packet @ 50000 tokens (1 very long list) 4.89100003242 450019 function calls in 4.893 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 50001 1.212 0.000 1.212 0.000 <string>:1(sendall) 50001 1.062 0.000 1.062 0.000 {method 'recv' of '_socket.socket' objects} 50001 0.594 0.000 2.282 0.000 Client.py:15(read_int) 50001 0.517 0.000 1.982 0.000 Client.py:7(send_int) 1 0.354 0.354 2.732 2.732 Client.py:22(read_int_list) 50001 0.335 0.000 0.524 0.000 struct.py:77(unpack) 50001 0.253 0.000 0.253 0.000 struct.py:54(pack) 50001 0.189 0.000 0.189 0.000 {method 'unpack' of 'Struct' objec ts} 1 0.176 0.176 2.158 2.158 Client.py:10(send_int_list) 50002 0.102 0.000 0.102 0.000 {len} 50000 0.097 0.000 0.097 0.000 {method 'append' of 'list' objects } 2 0.002 0.001 0.002 0.001 {range} 1 0.002 0.002 4.893 4.893 runme.py:19(bench2) 1 0.000 0.000 4.890 4.890 Client.py:31(spam) 2 0.000 0.000 0.000 0.000 {time.time} 1 0.000 0.000 4.893 4.893 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof iler' objects} |
|||
msg73078 - (view) | Author: Thorben (Thorben) | Date: 2008-09-12 08:01 | |
That's more like it. Now I'd like to see the same behavior on Linux... 2008/9/12 Gabriel Genellina <report@bugs.python.org>: > > Gabriel Genellina <gagsl-py2@yahoo.com.ar> added the comment: > > I've tested it on Windows XP. MSG_WAITALL is not supported, but I > replaced it using a while loop. I didn't notice any extraneous delay. > > 500 packets @ 2 tokens each (500 very short lists) > 0.140999794006 > 16008 function calls in 0.146 CPU seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1500 0.036 0.000 0.036 0.000 {method 'recv' of > '_socket.socket' > objects} > 1500 0.033 0.000 0.033 0.000 <string>:1(sendall) > 1500 0.016 0.000 0.065 0.000 Client.py:15(read_int) > 1500 0.015 0.000 0.053 0.000 Client.py:7(send_int) > 500 0.009 0.000 0.077 0.000 > Client.py:22(read_int_list) > 500 0.007 0.000 0.060 0.000 > Client.py:10(send_int_list) > 1500 0.007 0.000 0.010 0.000 struct.py:77(unpack) > 1500 0.005 0.000 0.005 0.000 struct.py:54(pack) > 500 0.004 0.000 0.141 0.000 Client.py:31(spam) > 2001 0.004 0.000 0.004 0.000 {len} > 1 0.003 0.003 0.146 0.146 runme.py:11(bench) > 1500 0.003 0.000 0.003 0.000 {method 'unpack' of > 'Struct' objec > ts} > 1001 0.003 0.000 0.003 0.000 {range} > 1000 0.002 0.000 0.002 0.000 {method 'append' of > 'list' objects > } > 1 0.000 0.000 0.000 0.000 struct.py:35(_compile) > 2 0.000 0.000 0.000 0.000 {time.time} > 1 0.000 0.000 0.146 0.146 <string>:1(<module>) > 1 0.000 0.000 0.000 0.000 {method 'disable' of > '_lsprof.Prof > iler' objects} > > > None > ================================================================================ > > > 1 packet @ 50000 tokens (1 very long list) > 4.89100003242 > 450019 function calls in 4.893 CPU seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 50001 1.212 0.000 1.212 0.000 <string>:1(sendall) > 50001 1.062 0.000 1.062 0.000 {method 'recv' of > '_socket.socket' > objects} > 50001 0.594 0.000 2.282 0.000 Client.py:15(read_int) > 50001 0.517 0.000 1.982 0.000 Client.py:7(send_int) > 1 0.354 0.354 2.732 2.732 > Client.py:22(read_int_list) > 50001 0.335 0.000 0.524 0.000 struct.py:77(unpack) > 50001 0.253 0.000 0.253 0.000 struct.py:54(pack) > 50001 0.189 0.000 0.189 0.000 {method 'unpack' of > 'Struct' objec > ts} > 1 0.176 0.176 2.158 2.158 > Client.py:10(send_int_list) > 50002 0.102 0.000 0.102 0.000 {len} > 50000 0.097 0.000 0.097 0.000 {method 'append' of > 'list' objects > } > 2 0.002 0.001 0.002 0.001 {range} > 1 0.002 0.002 4.893 4.893 runme.py:19(bench2) > 1 0.000 0.000 4.890 4.890 Client.py:31(spam) > 2 0.000 0.000 0.000 0.000 {time.time} > 1 0.000 0.000 4.893 4.893 <string>:1(<module>) > 1 0.000 0.000 0.000 0.000 {method 'disable' of > '_lsprof.Prof > iler' objects} > > ---------- > nosy: +gagenellina > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue3766> > _______________________________________ > |
|||
msg73176 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-09-13 11:31 | |
> Yes, obviously. Although adding it to the client socket did make no > difference after I had already done so for the server. Still > communication is too slow by orders of magnitude. (Sorry for pointing > this out again) Well, if this precise use case is really important for you, I suggest using Wireshark (or any other packet analyzer) to see what happens in terms of latency between packets over the wire. Something else: try replacing "localhost" with "127.0.0.1", perhaps your DNS resolution eats a lot of time. > I would greatly appreciate any help on the subject. How do *BSD > sockets differ from Linux sockets and what do I do to make things > faster. I don't know, but I suspect the difference is more in the TCP stack implementation than in the sockets layer. In any case, I'm gonna close this bug, as this is very likely not a Python problem. Please ask further questions on comp.lang.python, lots of people there may help you :) |
|||
msg81181 - (view) | Author: Sam (samslists) | Date: 2009-02-05 01:39 | |
I know this bug is closed, but I too am experiencing it under Linux 2.6.24-22 and Python 2.5.2. I'm using urllib2, and it's just spending an obscene amount of cpu time in {method 'recv' of '_socket.socket' objects}.... Anyone have any ideas? Would switching to httplib2 help? |
|||
msg124963 - (view) | Author: Konstantin Osipov (kostja) | Date: 2010-12-31 11:58 | |
I was able to observe the same issue: I'm using Python 2.6.5 on Ubuntu 10.0.4 LTS. My system is 64 bit Intel Core I7, Quad core, Linux kernel 2.6.32-generic x86_64, Ubuntu EGLIBC 2.11.1-0ubuntu7.5. A simple client TCP socket, which sends 35 bytes over to a server on localhost and receives 20 bytes in response, produces only 22 RPS. An identical application written in C gives me 7000 RPS. TCP_NODELAY is on on the server side. Turning on TCP_NODELAY on the client gives me ~500 RPS in Python (which I'm satisfied with, 'cause I think I then hit other bottlenecks). Still, such low performance on by default can be surprising and hard to debug. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:38 | admin | set | github: 48016 |
2010-12-31 11:58:09 | kostja | set | nosy:
+ kostja messages: + msg124963 |
2009-02-05 01:39:06 | samslists | set | nosy:
+ samslists messages: + msg81181 |
2008-09-13 11:31:45 | pitrou | set | status: open -> closed resolution: not a bug |
2008-09-13 11:31:18 | pitrou | set | messages: + msg73176 |
2008-09-12 08:01:19 | Thorben | set | messages: + msg73078 |
2008-09-12 03:11:19 | ggenellina | set | nosy:
+ ggenellina messages: + msg73076 |
2008-09-11 18:52:17 | Thorben | set | messages: + msg73050 |
2008-09-11 18:32:22 | pitrou | set | messages: + msg73048 |
2008-09-11 17:50:38 | Thorben | set | messages: + msg73043 |
2008-09-11 16:25:08 | pitrou | set | messages: + msg73040 |
2008-09-11 16:13:09 | Thorben | set | nosy:
+ Thorben messages: + msg73038 |
2008-09-11 16:02:16 | pitrou | set | nosy:
+ pitrou messages: + msg73035 |
2008-09-11 15:14:56 | giampaolo.rodola | set | nosy: + giampaolo.rodola |
2008-09-10 07:26:01 | thorben | set | messages: + msg72947 |
2008-09-03 21:58:47 | thorben | create |