Issue 2550: SO_REUSEADDR doesn't have the same semantics on Windows as on Unix

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46802

classification

Title:	SO_REUSEADDR doesn't have the same semantics on Windows as on Unix
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Windows	Versions:	Python 3.1, Python 3.2, Python 2.7, Python 2.6, Python 2.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	trent	Nosy List:	amak, exarkun, forest, gvanrossum, nnorwitz, pitrou, trent
Priority:	high	Keywords:	26backport, patch

Created on 2008-04-04 15:57 by trent, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_socket.py.patch	trent, 2008-04-04 15:57	Patch to trunk/Lib/test/test_socket.py
trunk.2550.patch	trent, 2008-04-06 21:24
trunk.2550-2.patch	trent, 2008-04-08 11:49

Messages (12)
msg64933 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-04 15:57
Background: I came across this issue when trying to track down why test_asynchat would periodically wedge python processes on the Windows buildbots, to the point that they wouldn't even respond to SIGKILL (or ctrl-c on the console). What I found after a bit of digging is that Windows doesn't raise EADDRINUSE socket.errors when you bind() two sockets to identical host/ports IFF SO_REUSEADDR has been set as a socket option. Decided to brighten up my tube journey into work this morning by reading the Gospel's take on the situation. As per the 'SO_REUSEADDR and SO_REUSEPORT Socket Options' section in chapter 7.5 of Stevens' UNIX Network Programming Volume 1 (2nd Ed): "With TCP, we are never able to start multiple servers that bind the same IP address and same port: a completely duplicate binding. That is, we cannot start one server that binds 198.69.10.2 port 80 and start another that also binds 198.69.10.2 port 80, even if we set the SO_REUSEADDR socket option for the second server." So, it seems at least Windows isn't adhering to this, at least on XP and Server 2008 with 2.5-2.6. I've patched test_socket.py to explicitly test for this situation -- as expected, it passes on Unix (tested on FreeBSD in particular), and fails on Windows. I'd like to commit this to trunk to see if any of the buildbots for different platforms match the behaviour of Windows.
msg65050 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-06 21:20
[Updating the issue with relevant mailing list conversation] Interesting results! I committed the patch to test_socket.py in r62152. I was expecting all other platforms except for Windows to behave consistently (i.e. pass). That is, given the following: import socket host = '127.0.0.1' sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((host, 0)) port = sock.getsockname()[1] sock.close() del sock sock1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock1.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock1.bind((host, port)) sock2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock2.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock2.bind((host, port)) ^^^^ ....the second bind should fail with EADDRINUSE, at least according to the 'SO_REUSEADDR and SO_REUSEPORT Socket Options' section in chapter 7.5 of Stevens' UNIX Network Programming Volume 1 (2nd Ed): "With TCP, we are never able to start multiple servers that bind the same IP address and same port: a completely duplicate binding. That is, we cannot start one server that binds 198.69.10.2 port 80 and start another that also binds 198.69.10.2 port 80, even if we set the SO_REUSEADDR socket option for the second server." The results: both Windows and Linux fail the patched test; none of the buildbots for either platform encountered an EADDRINUSE socket.error after the second bind(). FreeBSD, OS X, Solaris and Tru64 pass the test -- EADDRINUSE is raised on the second bind. (Interesting that all the ones that passed have a BSD lineage.) I've just reverted the test in r62156 as planned. The real issue now is that there are tests that are calling test_support.bind_socket() with the assumption that the port returned by this method is 'unbound', when in fact, the current implementation can't guarantee this: def bind_port(sock, host='', preferred_port=54321): for port in [preferred_port, 9907, 10243, 32999, 0]: try: sock.bind((host, port)) if port == 0: port = sock.getsockname()[1] return port except socket.error, (err, msg): if err != errno.EADDRINUSE: raise print >>sys.__stderr__, \ ' WARNING: failed to listen on port %d, trying another' % port This logic is only correct for platforms other than Windows and Linux. I haven't looked into all the networking test cases that rely on bind_port(), but I would think an implementation such as this would be much more reliable than what we've got for returning an unused port: def bind_port(sock, host='127.0.0.1', *args): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((host, 0)) port = s.getsockname()[1] s.close() del s sock.bind((host, port)) return port Actually, FWIW, I just ran a full regrtest.py against trunk on Win32 with this change in place and all the tests still pass. Thoughts? Trent.
msg65051 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-06 21:21
[Updating issue with mailing list discussion; Jean-Paul's reply] On Fri, 4 Apr 2008 13:24:49 -0700, Trent Nelson <tnelson@onresolve.com> wrote: >Interesting results! I committed the patch to test_socket.py in r62152. I was expecting all other platforms except for Windows to behave consistently (i.e. pass). That is, given the following: > > import socket > host = '127.0.0.1' > sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > sock.bind((host, 0)) > port = sock.getsockname()[1] > sock.close() > del sock > > sock1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > sock1.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) > sock1.bind((host, port)) > sock2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > sock2.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) > sock2.bind((host, port)) > ^^^^ > >....the second bind should fail with EADDRINUSE, at least according to the 'SO_REUSEADDR and SO_REUSEPORT Socket Options' section in chapter 7.5 of Stevens' UNIX Network Programming Volume 1 (2nd Ed): > >"With TCP, we are never able to start multiple servers that bind > the same IP address and same port: a completely duplicate binding. > That is, we cannot start one server that binds 198.69.10.2 port 80 > and start another that also binds 198.69.10.2 port 80, even if we > set the SO_REUSEADDR socket option for the second server." > >The results: both Windows and Linux fail the patched test; none of the buildbots for either platform encountered an EADDRINUSE socket.error after the second bind(). FreeBSD, OS X, Solaris and Tru64 pass the test -- EADDRINUSE is raised on the second bind. (Interesting that all the ones that passed have a BSD lineage.) Notice that the quoted text explains that you cannot start multiple servers that etc. Since you didn't call listen on either socket, it's arguable that you didn't start any servers, so there should be no surprise regarding the behavior. Try adding listen calls at various places in the example and you'll see something different happen. FWIW, AIUI, SO_REUSEADDR behaves just as described in the above quote on Linux/BSD/UNIX/etc. On Windows, however, that option actually means something quite different. It means that the address should be stolen from any process which happens to be using it at the moment. There is another option, SO_EXCLUSIVEADDRUSE, only on Windows I think, which, AIUI, makes it impossible for another process to steal the port using SO_REUSEADDR. Hope this helps, Jean-Paul
msg65052 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-06 21:21
[Updating issue with mailing list discussion; my reply to Jean-Paul] > >"With TCP, we are never able to start multiple servers that bind > > the same IP address and same port: a completely duplicate binding. > > That is, we cannot start one server that binds 198.69.10.2 port 80 > > and start another that also binds 198.69.10.2 port 80, even if we > > set the SO_REUSEADDR socket option for the second server." > Notice that the quoted text explains that you cannot start multiple > servers that etc. Since you didn't call listen on either socket, it's > arguable that you didn't start any servers, so there should be no > surprise regarding the behavior. Try adding listen calls at various > places in the example and you'll see something different happen. I agree in principle, Stevens says nothing about what happens if you do try and bind two sockets on two identical host/port addresses. Even so, test_support.bind_port() makes an assumption that bind() will raise EADDRINUSE if the port is not available, which, as has been demonstrated, won't be the case on Windows or Linux. > FWIW, AIUI, SO_REUSEADDR behaves just as described in the above quote > on Linux/BSD/UNIX/etc. On Windows, however, that option actually means > something quite different. It means that the address should be stolen > from any process which happens to be using it at the moment. Probably explains why the python process wedges when this happens on Windows... > There is another option, SO_EXCLUSIVEADDRUSE, only on Windows I think, > which, AIUI, makes it impossible for another process to steal the port > using SO_REUSEADDR. Nod, if SO_EXCLUSIVEADDRUSE is used instead in the code I posted, Windows raises EADDRINUSE on the second bind(). I don't have access to any Linux boxes at the moment, so I can't test what sort of error is raised with the example I posted if listen() and accept() are called on the two sockets bound to identical addresses. Can anyone else shed some light on this? I'd be interested in knowing if the process wedges on Linux as badly as it does on Windows (to the point where it's not respecting ctrl-c or sigkill). Trent.
msg65054 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-06 21:24
I've attached another patch that fixes test_support.bind_port() as well as a bunch of files that used that method. The new implementation always uses an ephemeral port in order to elicit an unused port for subsequent binding. Tested on Windows 32-bit & x64 and FreeBSD 6.2. Would like to apply sooner rather than later unless anyone has any objections as it'll fix my two Windows buildbots that are on the same machine from both hanging if they test asynchat at the same time (which happens more often than you'd think).
msg65055 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2008-04-06 22:04
Trent, go ahead and try this out. We should definitely be moving in this direction. So I'd rather fix the problem than keep suffering with the current problems of not being able to run the test suite concurrently. I think bind_port might be documented, so you should update the docs if so. Also, please add a Misc/NEWS entry.
msg65075 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-04-07 15:10
I don't like that the patch changes the API of a function in test_support() (in particular changing the return type; adding optional arguments is not a problem). This could trip up 3rd party users of this API. I recommend creating a new API bind_host_and_port() (or whatever you'd like to name it) and implement the original API in terms of the new one. (You can even add a warning if you think the original API is always unsafe.)
msg65077 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-07 16:03
To be honest, I wasn't really happy either with having to return HOST, it's somewhat redundant given that all these tests should be binding against localhost. What about something like this for bind_port(): def bind_port(sock, host=''): """Bind the socket to a free port and return the port number. Relies on ephemeral ports in order to ensure we are using an unbound port. This is important as many tests may be running simultaneously, especially in a buildbot environment.""" # Use a temporary socket object to ensure we're not # affected by any socket options that have already # been set on the 'sock' object we're passed. tempsock = socket.socket(sock.family, sock.type) tempsock.bind((host, 0)) port = tempsock.getsockname()[1] tempsock.close() del tempsock sock.bind((host, port)) return port The tests would then look something like: HOST = 'localhost' PORT = None class Foo(TestCase): def setUp(self): sock = socket.socket() global PORT PORT = test_support.bind_port(sock, HOST) So, the return value is the port bound to, no change there, but we're abolishing preferred_port as an optional argument, which is important, IMO, as none of these tests should be stipulating which port they want to listen on. That's actually the root of this entire problem.
msg65078 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-04-07 17:20
Thanks, that's much better (though I'm not the authority on all details of this patch).
msg65155 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-08 11:49
Invested quite a few cycles on this issue last night. The more time I spent on it, the more I became convinced that every single test working with sockets should be changed in one fell swoop in order to facilitate (virtually unlimited) parallel test execution without fear of port conflicts. I've attached a second patch, trunk.2550-2.patch, which is my progress so far on doing just this. The main changes can be expressed by the following two points: a) do whatever it takes in network-oriented tests to ensure unique ports are obtained (relying on the bind_port() and find_unused_port() methods exposed by test_support) b) never, ever, ever call SO_REUSEADDR on a socket from a test; because we're putting so much effort into obtaining a unique port, this should never be necessary -- in the rare cases that our attempts to obtain a unique port fail, then we absolutely should fail with EADDRINUSE, as the ability to obtain a unique port for the duration of a client/server test is an invariant that we must be able to depend upon. If the invariant is broken, fail immediately (don't mask the problem with SO_REUSEADDR). With this patch applied, I can spawn a handful of Python processes and run the entire test suite (without -r, ensuring all tests are run in the same order, which should encourage port conflicts (if there were any)) without any errors. Doing that now is completely and utterly impossible. [] Well, almost without error. All the I/O related tests that try and open @test fail. I believe there's still outstanding work to do with this patch with regards to how the intracacies of SO_REUSEADDR and SO_EXCLUSIVEADDRUSE should be handled in the rest of the stdlib. I'm still thinking about the best approach for this. However, the patch as it currently stands is still quite substantial so I wanted to get it out sooner rather than later for review. (I'll forward this to python-dev@ to try and encourage more eyes from people with far more network-fu than I.)
msg65224 - (view)	Author: Trent Nelson (trent) *	Date: 2008-04-08 23:48
Committed updates to relevant network-oriented tests, as well as test_support changes discussed, in r62234.
msg104365 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-04-27 21:26
This is now fixed, right? Personal experience as well as buildbot behaviour seems to show that parallel test execution (either through -j, or by running several test suites at the same time) works ok.

History
Date	User	Action	Args
2022-04-11 14:56:33	admin	set	github: 46802
2010-04-27 21:26:06	pitrou	set	status: open -> closed nosy: + pitrou, exarkun messages: + msg104365 resolution: accepted -> fixed stage: test needed -> resolved
2010-03-20 17:44:28	r.david.murray	set	stage: test needed versions: + Python 3.1, Python 2.7, Python 3.2, - Python 3.0
2008-09-18 22:05:37	forest	set	nosy: + forest
2008-05-13 18:23:06	amak	set	nosy: + amak
2008-04-08 23:48:17	trent	set	messages: + msg65224
2008-04-08 11:49:32	trent	set	files: + trunk.2550-2.patch messages: + msg65155
2008-04-07 17:20:41	gvanrossum	set	messages: + msg65078
2008-04-07 16:03:51	trent	set	messages: + msg65077
2008-04-07 15:10:09	gvanrossum	set	nosy: + gvanrossum messages: + msg65075
2008-04-06 22:04:54	nnorwitz	set	resolution: accepted messages: + msg65055 nosy: + nnorwitz
2008-04-06 21:25:02	trent	set	files: + trunk.2550.patch messages: + msg65054
2008-04-06 21:21:34	trent	set	messages: + msg65052
2008-04-06 21:21:02	trent	set	messages: + msg65051
2008-04-06 21:20:26	trent	set	messages: + msg65050
2008-04-04 15:57:33	trent	create