This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Socket freezing under load issue on Mac.
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amcgregor, exarkun, neologix, r.david.murray
Priority: normal Keywords:

Created on 2010-05-19 23:37 by amcgregor, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg106117 - (view) Author: Alice Bevan-McGregor (amcgregor) Date: 2010-05-19 23:37
Using the wsgiref simple HTTP server or any other capable of > 2000 requests/sec. demonstrates an issue with Macintosh sockets.

Mac OS X Version: 10.6.3 (Build 10D573)
Python Version: 2.6.1 (32-bit)

The minimal application needed to demonstrate the problem is:

    import sys, cStringIO
    from wsgiref.simple_server import make_server
    
    sys.stderr = cStringIO.StringIO() # disable request logging
    
    def app(environ, start_response):
        start_response("200 OK", [('Content-Type', 'text/plain')])
        return ['Hello world!\n']
    
    httpd = make_server('', 8080, app)
    httpd.serve_forever()

Then hammer the server using Apache Bench:

    ab -n 20000 -c 5 http://127.0.0.1:8080/

At almost exactly the 16000 request mark socket connections begin to time out.  Sockets are then freed up at the rate of about 40/second (on my box).  Killing the ab run when it freezes then immediately re-trying (and cancelling after a few seconds) will show this rate.  Time must pass for some connection 'pool' to free the connections before you can do another 16000 requests.

This problem does not appear on the following setup:

Operating System: Gentoo Linux
Python Version: 2.6.4 (32-bit)
msg106118 - (view) Author: Alice Bevan-McGregor (amcgregor) Date: 2010-05-19 23:41
I can confirm this issue also effecting 2.5.4 on my Mac.
msg106119 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-19 23:59
Why do you think this is a bug in Python?  I'm not yet saying it isn't, but we'll need additional information to show that it is.  Given that it works on Linux, it seems like the most likely case would be that it is an issue with the OS (perhaps requiring adjusting OS tuning parameters), since the Python socket library is a fairly thin wrapper around the OS socket library.
msg106120 - (view) Author: Alice Bevan-McGregor (amcgregor) Date: 2010-05-20 00:03
Unfortunately, unless I can get instructions on how to properly diagnose socket libraries, I've exhausted my ability to debug this.  I used to be a C programmer, but that was 12 years ago.

I'm hoping to a) confirm the problem exists on Mac (not just my computer), and b) get someone familiar with Python's socket implementation and socket programming in general to figure out what's actually going on here.

I also don't have access to Apple's radar bug tracker to check there.  :(
msg106160 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-05-20 15:59
Have you looked at the number of TIME_WAIT sockets you have on the system when your benchmark gets to the 16000 request mark?

This looks exactly like a regular TCP limitation to me.  You'll find the limit on any platform, not just OS X.
msg112901 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-04 21:34
This needs to be re-verified on a current version.
msg126135 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-01-12 20:38
As explained by Jean-Paul, it's due to the fact that the closed TCP sockets spend some time in TIME-WAIT state before being deallocated.
On Linux, this issue can be more or less worked-around using sysctl (net.ipv4.tcp_tw_{reuse,recycle}). There might be something similar on OS-X.
It's definitely an OS-level tuning issue.
Suggesting to close.
msg126136 - (view) Author: Alice Bevan-McGregor (amcgregor) Date: 2011-01-12 20:43
Agreed; I'm now certain it's a local tuning issue.  My first attempt to alter the file descriptor limits for local testing resulted in catastrophic system failure, though, so I have no clue as to the correct method to alter the TIME_WAIT time.

I will continue to investigate, thank you for the lead.
History
Date User Action Args
2022-04-11 14:57:01adminsetgithub: 53017
2011-01-12 22:03:00terry.reedysetnosy: - terry.reedy
2011-01-12 20:43:50amcgregorsetstatus: open -> closed

messages: + msg126136
resolution: not a bug
nosy: terry.reedy, exarkun, r.david.murray, neologix, amcgregor
2011-01-12 20:38:55neologixsetnosy: + neologix
messages: + msg126135
2010-08-04 21:34:12terry.reedysetnosy: + terry.reedy

messages: + msg112901
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 2.5
2010-05-20 15:59:46exarkunsetnosy: + exarkun
messages: + msg106160
2010-05-20 00:03:28amcgregorsetmessages: + msg106120
2010-05-19 23:59:23r.david.murraysetnosy: + r.david.murray
messages: + msg106119
2010-05-19 23:41:49amcgregorsetmessages: + msg106118
versions: + Python 2.5
2010-05-19 23:37:25amcgregorcreate