Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.socket.recv broken (unbearably slow) #48016

Closed
thorben mannequin opened this issue Sep 3, 2008 · 13 comments
Closed

socket.socket.recv broken (unbearably slow) #48016

thorben mannequin opened this issue Sep 3, 2008 · 13 comments
Labels
stdlib Python modules in the Lib dir

Comments

@thorben
Copy link
Mannequin

thorben mannequin commented Sep 3, 2008

BPO 3766
Nosy @pitrou, @giampaolo
Files
  • socket_bug.tar.bz2: file containing all relevant files to (hopefully) reproduce this bug
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-09-13.11:31:45.313>
    created_at = <Date 2008-09-03.21:58:47.560>
    labels = ['invalid', 'library']
    title = 'socket.socket.recv broken (unbearably slow)'
    updated_at = <Date 2010-12-31.11:58:09.493>
    user = 'https://bugs.python.org/thorben'

    bugs.python.org fields:

    activity = <Date 2010-12-31.11:58:09.493>
    actor = 'kostja'
    assignee = 'none'
    closed = True
    closed_date = <Date 2008-09-13.11:31:45.313>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2008-09-03.21:58:47.560>
    creator = 'thorben'
    dependencies = []
    files = ['11363']
    hgrepos = []
    issue_num = 3766
    keywords = []
    message_count = 13.0
    messages = ['72424', '72947', '73035', '73038', '73040', '73043', '73048', '73050', '73076', '73078', '73176', '81181', '124963']
    nosy_count = 7.0
    nosy_names = ['ggenellina', 'pitrou', 'giampaolo.rodola', 'Thorben', 'thorben', 'samslists', 'kostja']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue3766'
    versions = ['Python 2.5', 'Python 2.4']

    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 3, 2008

    Under Linux 2.6.24-1-amd64 as well as 2.6.26 (x86-32), Python versions
    2.5.2 and 2.4.4 (on both machines), there is a huge discrepancy between
    socket read latencies, depending on "code context".

    Some detail:

    For a university project, I wrote a python client for a query-server. A
    reference implementation existed in Perl, so the job was pretty straight
    forward. However, for reasons unknown to me, the Python implementation
    was by several orders of magnitude slower than the reference
    implementation. To track down the issue, I wrote a dummy version of the
    query-server in Python, where the problem persisted. I then stripped
    down both client and server to the minimal functionality and still the
    problem persisted.
    I wrote a demo inside a single file using socketpair() to post here, but
    the issue was not reproducible.

    Finally, I stripped down the original client/server to a postable level
    and ran a profiler on a testcase.

    Here is the gist of it:

    Sending 500 packets @ 2 tokens each (500 very short lists) takes 40
    seconds on average.

    In detail:

    14508 function calls in 39.980 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    1500 39.834 0.027 39.834 0.027 {method 'recv' of
    '_socket.socket' objects}
    1500 0.024 0.000 39.877 0.027 Client.py:16(read_int)
    1500 0.020 0.000 0.020 0.000 <string>:1(sendall)
    1500 0.018 0.000 0.048 0.000 Client.py:8(send_int)
    500 0.016 0.000 39.903 0.080 Client.py:19(read_int_list)
    1500 0.015 0.000 0.019 0.000 struct.py:77(unpack)
    500 0.010 0.000 0.060 0.000 Client.py:11(send_int_list)
    1500 0.010 0.000 0.010 0.000 struct.py:54(pack)
    1 0.009 0.009 39.980 39.980 dummytest.py:12(bench)
    1000 0.007 0.000 0.007 0.000 {method 'insert' of 'list'
    objects}
    1001 0.006 0.000 0.006 0.000 {range}
    500 0.005 0.000 39.968 0.080 Client.py:28(spam)
    1500 0.005 0.000 0.005 0.000 {method 'unpack' of
    'Struct' objects}
    501 0.002 0.000 0.002 0.000 {len}

    Here is the same for 1 packet @ 50000 tokens (1 very long list), taking
    below 10 seconds on average.

    8.51872587204
    400018 function calls in 8.519 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    50001 2.980 0.000 2.980 0.000 {method 'recv' of
    '_socket.socket' objects}
    50000 2.980 0.000 2.980 0.000 {method 'insert' of 'list'
    objects}
    50001 0.993 0.000 0.993 0.000 <string>:1(sendall)
    50001 0.410 0.000 1.558 0.000 Client.py:8(send_int)
    50001 0.334 0.000 3.581 0.000 Client.py:16(read_int)
    1 0.247 0.247 6.809 6.809 Client.py:19(read_int_list)
    50001 0.191 0.000 0.266 0.000 struct.py:77(unpack)
    50001 0.154 0.000 0.154 0.000 struct.py:54(pack)
    1 0.146 0.146 1.703 1.703 Client.py:11(send_int_list)
    50001 0.075 0.000 0.075 0.000 {method 'unpack' of
    'Struct' objects}

    I don't get the reason for the huge speed discrepancy. I include all
    source code files for reproducing the issue.

    Notably, the original software (non stripped-down version) runs without
    these issues using a OS X Python version. Details may follow, I don't
    own a Mac but know people who do.

    Also note that I can't get the server to shut down properly (the thread
    does not terminate). You have to kill the process manually and wait for
    the port to be freed by the kernel. Maybe this is easily fixable but at
    least I don't know how.

    The attached archive contains all source code plus README and the
    socketpair() version.

    @thorben thorben mannequin added the stdlib Python modules in the Lib dir label Sep 3, 2008
    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 10, 2008

    As promised, here is the profiler output for a Mac (thanks Stefan). The
    problem does NOT occur here, which should help greatly in tracking down
    the cause in the socket library. Anyone into this?

    $ python runme.py
    500 packets @ 2 tokens each (500 very short lists)
    0.0814669132233
            14508 function calls in 0.082 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    1500 0.048 0.000 0.048 0.000 {method 'recv' of
    '_socket.socket' objects}
    1500 0.015 0.000 0.015 0.000 <string>:1(sendall)
    1500 0.003 0.000 0.021 0.000 Client.py:8(send_int)
    1500 0.003 0.000 0.053 0.000 Client.py:16(read_int)
    500 0.003 0.000 0.057 0.000 Client.py:19(read_int_list)
    1500 0.002 0.000 0.002 0.000 struct.py:77(unpack)
    1500 0.002 0.000 0.002 0.000 struct.py:54(pack)
    500 0.001 0.000 0.022 0.000 Client.py:11(send_int_list)
    1 0.001 0.001 0.082 0.082 runme.py:12(bench)
    1000 0.001 0.000 0.001 0.000 {method 'insert' of 'list'
    objects}
    1001 0.001 0.000 0.001 0.000 {range}
    500 0.001 0.000 0.080 0.000 Client.py:28(spam)
    1500 0.000 0.000 0.000 0.000 {method 'unpack' of
    'Struct' objects}
    501 0.000 0.000 0.000 0.000 {len}

    1 packet @ 50000 tokens (1 very long list)
    5.20953297615
    400018 function calls in 5.210 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    50001 2.267 0.000 2.267 0.000 <string>:1(sendall)
    50001 1.263 0.000 1.263 0.000 {method 'recv' of
    '_socket.socket' objects}
    50000 1.112 0.000 1.112 0.000 {method 'insert' of 'list'
    objects}
    50001 0.137 0.000 1.490 0.000 Client.py:16(read_int)
    50001 0.134 0.000 2.478 0.000 Client.py:8(send_int)
    1 0.082 0.082 2.685 2.685 Client.py:19(read_int_list)
    50001 0.077 0.000 0.077 0.000 struct.py:54(pack)
    50001 0.073 0.000 0.089 0.000 struct.py:77(unpack)
    1 0.044 0.044 2.522 2.522 Client.py:11(send_int_list)
    50001 0.016 0.000 0.016 0.000 {method 'unpack' of
    'Struct' objects}

    @pitrou
    Copy link
    Member

    pitrou commented Sep 11, 2008

    Thorben, is the problem still there if you use fork() rather than
    launching a separate thread for the server?

    The implementation of the recv() method is straightforward and I don't
    see anything that could cause a huge latency there, it's just a simple
    wrapper over the C library's recv() function. The waiting certainly
    occurs inside the OS rather than inside the Python interpreter.

    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 11, 2008

    The problem exists even if the server part is replaced by a server
    written in C. I only wrote up the dummy server in python to be able to
    provide a testcase.

    The C server works with reasonable speed when connected to a client
    written in perl by the way. My employer is quite disappointed with
    Python's performance... (He provided the profiler data for the Mac by
    the way) I almost wish that I did something wrong, but this does not
    seem to be the case.

    Nevertheless, I will try out your suggestion.

    Thanks for replying,

    Thorben

    2008/9/11 Antoine Pitrou <report@bugs.python.org>:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Thorben, is the problem still there if you use fork() rather than
    launching a separate thread for the server?

    The implementation of the recv() method is straightforward and I don't
    see anything that could cause a huge latency there, it's just a simple
    wrapper over the C library's recv() function. The waiting certainly
    occurs inside the OS rather than inside the Python interpreter.

    ----------
    nosy: +pitrou


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue3766\>


    @pitrou
    Copy link
    Member

    pitrou commented Sep 11, 2008

    You can use setsockopt() to set the TCP_NODELAY flag and see if that
    improves things; sending thousands of 4-bytes packets isn't exactly the
    expected utilization of TCP.

    As I said, socket.recv() is just a thin wrapper above the same-named C
    library function. Your code expects the kernel's TCP stack to send
    things as soon as you ask it to, but TCP is a high-level stream protocol
    and implementations usually have sophisticated algorithms in order to
    minimize the number of individual packets sent over the wire.

    By the way, if you want to build network applications (clients and
    servers), I suggest you take a look at the Twisted framework. It will
    free you from many low-level issues, although it won't prevent you from
    shooting yourself in the foot :-)

    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 11, 2008

    2008/9/11 Antoine Pitrou <report@bugs.python.org>:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    You can use setsockopt() to set the TCP_NODELAY flag and see if that
    improves things; sending thousands of 4-bytes packets isn't exactly the
    expected utilization of TCP.

    You are obviously right, but IIRC, I mentioned that this is simply a
    dummy testcase...

    now thats interesting:
    adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
    the delay by half. It still is extremely high but it's a start.

    Do you think I could get that value down much further? I don't know
    much about TCP...

    As I said, socket.recv() is just a thin wrapper above the same-named C
    library function. Your code expects the kernel's TCP stack to send
    things as soon as you ask it to, but TCP is a high-level stream protocol
    and implementations usually have sophisticated algorithms in order to
    minimize the number of individual packets sent over the wire.

    Would be interesting to examine the differences between the Perl
    wrapper and the Python wrapper to figure out why Perl "does the right
    thing" in this case and Python doesn't.

    By the way, if you want to build network applications (clients and
    servers), I suggest you take a look at the Twisted framework. It will
    free you from many low-level issues, although it won't prevent you from
    shooting yourself in the foot :-)

    Thanks, I'll make sure to try it out.


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue3766\>


    @pitrou
    Copy link
    Member

    pitrou commented Sep 11, 2008

    now thats interesting:
    adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
    the delay by half. It still is extremely high but it's a start.

    Did you do it on both the client and server sockets?

    Would be interesting to examine the differences between the Perl
    wrapper and the Python wrapper to figure out why Perl "does the right
    thing" in this case and Python doesn't.

    Perhaps the Perl wrapper is less thin as the Python one. In any case,
    it's by design if the Python socket wrapper doesn't try to be "smart":
    the intent is to provide an access to the C API and let people do what
    they want with it. Smart things are relegated to higher-level modules or
    libraries.

    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 11, 2008

    2008/9/11 Antoine Pitrou <report@bugs.python.org>:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > now thats interesting:
    > adding the line "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) " decreased
    > the delay by half. It still is extremely high but it's a start.

    Did you do it on both the client and server sockets?

    Yes, obviously. Although adding it to the client socket did make no
    difference after I had already done so for the server. Still
    communication is too slow by orders of magnitude. (Sorry for pointing
    this out again)

    > Would be interesting to examine the differences between the Perl
    > wrapper and the Python wrapper to figure out why Perl "does the right
    > thing" in this case and Python doesn't.

    Perhaps the Perl wrapper is less thin as the Python one. In any case,
    it's by design if the Python socket wrapper doesn't try to be "smart":
    the intent is to provide an access to the C API and let people do what
    they want with it. Smart things are relegated to higher-level modules or
    libraries.

    I would greatly appreciate any help on the subject. How do *BSD
    sockets differ from Linux sockets and what do I do to make things
    faster. I think this might be where the real issue is. Low level
    networking voodoo. Do you think twisted might help me there?


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue3766\>


    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Sep 12, 2008

    I've tested it on Windows XP. MSG_WAITALL is not supported, but I
    replaced it using a while loop. I didn't notice any extraneous delay.

    500 packets @ 2 tokens each (500 very short lists)
    0.140999794006
    16008 function calls in 0.146 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    1500 0.036 0.000 0.036 0.000 {method 'recv' of
    '_socket.socket'
    objects}
    1500 0.033 0.000 0.033 0.000 <string>:1(sendall)
    1500 0.016 0.000 0.065 0.000 Client.py:15(read_int)
    1500 0.015 0.000 0.053 0.000 Client.py:7(send_int)
    500 0.009 0.000 0.077 0.000
    Client.py:22(read_int_list)
    500 0.007 0.000 0.060 0.000
    Client.py:10(send_int_list)
    1500 0.007 0.000 0.010 0.000 struct.py:77(unpack)
    1500 0.005 0.000 0.005 0.000 struct.py:54(pack)
    500 0.004 0.000 0.141 0.000 Client.py:31(spam)
    2001 0.004 0.000 0.004 0.000 {len}
    1 0.003 0.003 0.146 0.146 runme.py:11(bench)
    1500 0.003 0.000 0.003 0.000 {method 'unpack' of
    'Struct' objec
    ts}
    1001 0.003 0.000 0.003 0.000 {range}
    1000 0.002 0.000 0.002 0.000 {method 'append' of
    'list' objects
    }
    1 0.000 0.000 0.000 0.000 struct.py:35(_compile)
    2 0.000 0.000 0.000 0.000 {time.time}
    1 0.000 0.000 0.146 0.146 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Prof
    iler' objects}

    None
    ================================================================================

    1 packet @ 50000 tokens (1 very long list)
    4.89100003242
    450019 function calls in 4.893 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    50001 1.212 0.000 1.212 0.000 <string>:1(sendall)
    50001 1.062 0.000 1.062 0.000 {method 'recv' of
    '_socket.socket'
    objects}
    50001 0.594 0.000 2.282 0.000 Client.py:15(read_int)
    50001 0.517 0.000 1.982 0.000 Client.py:7(send_int)
    1 0.354 0.354 2.732 2.732
    Client.py:22(read_int_list)
    50001 0.335 0.000 0.524 0.000 struct.py:77(unpack)
    50001 0.253 0.000 0.253 0.000 struct.py:54(pack)
    50001 0.189 0.000 0.189 0.000 {method 'unpack' of
    'Struct' objec
    ts}
    1 0.176 0.176 2.158 2.158
    Client.py:10(send_int_list)
    50002 0.102 0.000 0.102 0.000 {len}
    50000 0.097 0.000 0.097 0.000 {method 'append' of
    'list' objects
    }
    2 0.002 0.001 0.002 0.001 {range}
    1 0.002 0.002 4.893 4.893 runme.py:19(bench2)
    1 0.000 0.000 4.890 4.890 Client.py:31(spam)
    2 0.000 0.000 0.000 0.000 {time.time}
    1 0.000 0.000 4.893 4.893 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Prof
    iler' objects}

    @thorben
    Copy link
    Mannequin Author

    thorben mannequin commented Sep 12, 2008

    That's more like it. Now I'd like to see the same behavior on Linux...

    2008/9/12 Gabriel Genellina <report@bugs.python.org>:

    Gabriel Genellina <gagsl-py2@yahoo.com.ar> added the comment:

    I've tested it on Windows XP. MSG_WAITALL is not supported, but I
    replaced it using a while loop. I didn't notice any extraneous delay.

    500 packets @ 2 tokens each (500 very short lists)
    0.140999794006
    16008 function calls in 0.146 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    1500 0.036 0.000 0.036 0.000 {method 'recv' of
    '_socket.socket'
    objects}
    1500 0.033 0.000 0.033 0.000 <string>:1(sendall)
    1500 0.016 0.000 0.065 0.000 Client.py:15(read_int)
    1500 0.015 0.000 0.053 0.000 Client.py:7(send_int)
    500 0.009 0.000 0.077 0.000
    Client.py:22(read_int_list)
    500 0.007 0.000 0.060 0.000
    Client.py:10(send_int_list)
    1500 0.007 0.000 0.010 0.000 struct.py:77(unpack)
    1500 0.005 0.000 0.005 0.000 struct.py:54(pack)
    500 0.004 0.000 0.141 0.000 Client.py:31(spam)
    2001 0.004 0.000 0.004 0.000 {len}
    1 0.003 0.003 0.146 0.146 runme.py:11(bench)
    1500 0.003 0.000 0.003 0.000 {method 'unpack' of
    'Struct' objec
    ts}
    1001 0.003 0.000 0.003 0.000 {range}
    1000 0.002 0.000 0.002 0.000 {method 'append' of
    'list' objects
    }
    1 0.000 0.000 0.000 0.000 struct.py:35(_compile)
    2 0.000 0.000 0.000 0.000 {time.time}
    1 0.000 0.000 0.146 0.146 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Prof
    iler' objects}

    None
    ================================================================================

    1 packet @ 50000 tokens (1 very long list)
    4.89100003242
    450019 function calls in 4.893 CPU seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    50001 1.212 0.000 1.212 0.000 <string>:1(sendall)
    50001 1.062 0.000 1.062 0.000 {method 'recv' of
    '_socket.socket'
    objects}
    50001 0.594 0.000 2.282 0.000 Client.py:15(read_int)
    50001 0.517 0.000 1.982 0.000 Client.py:7(send_int)
    1 0.354 0.354 2.732 2.732
    Client.py:22(read_int_list)
    50001 0.335 0.000 0.524 0.000 struct.py:77(unpack)
    50001 0.253 0.000 0.253 0.000 struct.py:54(pack)
    50001 0.189 0.000 0.189 0.000 {method 'unpack' of
    'Struct' objec
    ts}
    1 0.176 0.176 2.158 2.158
    Client.py:10(send_int_list)
    50002 0.102 0.000 0.102 0.000 {len}
    50000 0.097 0.000 0.097 0.000 {method 'append' of
    'list' objects
    }
    2 0.002 0.001 0.002 0.001 {range}
    1 0.002 0.002 4.893 4.893 runme.py:19(bench2)
    1 0.000 0.000 4.890 4.890 Client.py:31(spam)
    2 0.000 0.000 0.000 0.000 {time.time}
    1 0.000 0.000 4.893 4.893 <string>:1(<module>)
    1 0.000 0.000 0.000 0.000 {method 'disable' of
    '_lsprof.Prof
    iler' objects}

    ----------
    nosy: +gagenellina


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue3766\>


    @pitrou
    Copy link
    Member

    pitrou commented Sep 13, 2008

    Yes, obviously. Although adding it to the client socket did make no
    difference after I had already done so for the server. Still
    communication is too slow by orders of magnitude. (Sorry for pointing
    this out again)

    Well, if this precise use case is really important for you, I suggest
    using Wireshark (or any other packet analyzer) to see what happens in
    terms of latency between packets over the wire.

    Something else: try replacing "localhost" with "127.0.0.1", perhaps your
    DNS resolution eats a lot of time.

    I would greatly appreciate any help on the subject. How do *BSD
    sockets differ from Linux sockets and what do I do to make things
    faster.

    I don't know, but I suspect the difference is more in the TCP stack
    implementation than in the sockets layer.

    In any case, I'm gonna close this bug, as this is very likely not a
    Python problem. Please ask further questions on comp.lang.python, lots
    of people there may help you :)

    @pitrou pitrou closed this as completed Sep 13, 2008
    @pitrou pitrou added the invalid label Sep 13, 2008
    @samslists
    Copy link
    Mannequin

    samslists mannequin commented Feb 5, 2009

    I know this bug is closed, but I too am experiencing it under Linux
    2.6.24-22 and Python 2.5.2.

    I'm using urllib2, and it's just spending an obscene amount of cpu time
    in {method 'recv' of '_socket.socket' objects}....

    Anyone have any ideas?

    Would switching to httplib2 help?

    @kostja
    Copy link
    Mannequin

    kostja mannequin commented Dec 31, 2010

    I was able to observe the same issue:

    I'm using Python 2.6.5 on Ubuntu 10.0.4 LTS. My system is 64 bit Intel Core I7, Quad core, Linux kernel 2.6.32-generic x86_64, Ubuntu EGLIBC 2.11.1-0ubuntu7.5.

    A simple client TCP socket, which sends 35 bytes over to a server on localhost and receives 20 bytes in response, produces only 22 RPS. An identical application written in C gives me 7000 RPS. TCP_NODELAY is on on the server side. Turning on TCP_NODELAY on the client gives me ~500 RPS in Python (which I'm satisfied with, 'cause I think I then hit other bottlenecks).
    Still, such low performance on by default can be surprising and hard to debug.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant