Issue 28708: Low FD_SETSIZE limit on Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72894

classification

Title:	Low FD_SETSIZE limit on Windows
Type:	resource usage	Stage:	patch review
Components:	Extension Modules, Windows	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	David Hirschfeld, MollyEskam, alanpreston, andzn, asvetlov, desbma, eryksun, nicoleallen, njs, paul.moore, pitrou, stefanyuk.ana, steve.dower, tim.golden, zach.ware
Priority:	normal	Keywords:	patch

Created on 2016-11-15 22:52 by David Hirschfeld, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 13842	open	andzn, 2019-06-05 12:41
PR 13845	closed	vstinner, 2019-06-05 16:07

Messages (20)
msg280900 - (view)	Author: David Hirschfeld (David Hirschfeld)	Date: 2016-11-15 22:52
Back in 1999 the compile-time constant FD_SETSIZE was raised from (the default on Windows) 64 to 512 open sockets to support serious async servers http://bugs.python.org/issue210843 https://github.com/python/cpython/blame/master/Modules/selectmodule.c#L29-L36 The world has moved on and serious async servers require far more than 512 sockets available. This is one of the key reasons why tornado explicitly states: > Tornado will also run on Windows, although this configuration is not officially supported and is recommended only for development use. Yes, using select on Windows is the wrong thing to do, but it's far preferable to be able to use a library which makes use of select, putting up with the poor performance than it is to be told to use linux which often isn't an option. Since there's no alternative other than recompiling Python it would be good if this constant could be increased to a more useful (these days) value of say 16384. As a data point ZMQ have recently increased the value of this constant to 16k themselves https://github.com/zeromq/libzmq/pull/2035
msg280901 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-11-15 22:57
To implement an efficient event loop, IOCP is the way to do: the asyncio module supports it using ProactorEventLoop.
msg280902 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-11-15 23:42
It looks like Modules/selectmodule.c already has the handling for when this gets large (and we really want to stop allocating the static array on the stack), which is a plus, but I'd want to see the performance cost to small selects if suddenly we're allocating 100s of KB on the heap rather than small and cheap stack allocations. For the amount of work going on here for each select() call, Victor is right - you're going to get severely diminishing returns as this increases in size. However, I see no reason why this couldn't be totally dynamic, at least on Windows. Redefining that value just changes a static array definition, but the array size never gets passed in to the OS AFAICT, so there's no reason we couldn't stack allocate for small select()s and heap allocate for big select()s. That's considerably more work than anyone had in mind, I'd wager.
msg281058 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-11-17 17:25
> To implement an efficient event loop, IOCP is the way to do: the asyncio module supports it using ProactorEventLoop. Not everyone uses asyncio, though, so it would still be nice to see the select() limit raised anyway. Implementing your own event loop using IOCP is a hard exercise, much harder than getting something to work with select().
msg305672 - (view)	Author: desbma (desbma) *	Date: 2017-11-06 19:48
I just want to say that I strongly support either bumping the value of FD_SETSIZE to something a lot higher than 512, or making it configurable from Python code. I am the author of a program that makes heavy use of asyncio. Some Windows users have reported errors when using big directory trees, that I could not reproduce on Linux. Then I found the note about the SelectorEventLoop limitation in the doc https://docs.python.org/3/library/asyncio-eventloops.html#windows I can't use ProactorEventLoop because I support Python 3.4 which does not have it. I had to work around this limitation by calling run_until_complete periodically to limit the number of pending tasks. I sincerely think this the kind of thing that can hurt the global usage of asyncio. Hell I can do 600 IO tasks in parallel if I want to with concurrent.futures.ThreadPoolExecutor, why can't I do the same with asyncio?
msg344713 - (view)	Author: Andrei Zene (andzn) *	Date: 2019-06-05 12:42
We would also need this limit to be raised. We are using a server that was implemented using tornado, and tornado uses select.select. > It looks like Modules/selectmodule.c already has the handling for when this gets large (and we really want to stop allocating the static array on the stack), which is a plus, but I'd want to see the performance cost to small selects if suddenly we're allocating 100s of KB on the heap rather than small and cheap stack allocations. I gave this a try. The code I used for the server and the client for taken from this page: https://steelkiwi.com/blog/working-tcp-sockets/. I just added some code to measure the time it stays in the "select" call. To measure the time I used time.process_time(). I then connected/disconnected only 1 client 10.000 times in a row, discarding the measurements of the first 1000 iterations (warm-up). Here are the results: FD_SETSIZE = 16384 bytes Time elapsed (avg ns): 20029.566224207087 Time elapsed (max ns): 15625000 Time elapsed (min ns): 0 FD_SETSIZE = 512 bytes Time elapsed (avg ns): 14671.361502347418 Time elapsed (max ns): 15625000 Time elapsed (min ns): 0 > However, I see no reason why this couldn't be totally dynamic, at least on Windows. Redefining that value just changes a static array definition, but the array size never gets passed in to the OS AFAICT, so there's no reason we couldn't stack allocate for small select()s and heap allocate for big select()s. That's considerably more work than anyone had in mind, I'd wager. I'm not sure this can be totally dynamic on Windows. The problem is the fd_set struct which is defined in Winsock2.h typedef struct fd_set { u_int fd_count; /* how many are SET? / SOCKET fd_array[FD_SETSIZE]; / an array of SOCKETs */ } fd_set; The heap-allocated pylists are converted into fd_set structs which are always allocated on the stack btw. The problem is there's no way to create a fd_set of a desired FD_SETSIZE at runtime. I have also submitted a PR with the change for convenience: https://github.com/python/cpython/pull/13842.
msg344715 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-05 12:44
Would it be possible to make FD_SETSIZE configurable at runtime, at least on Windows? IMHO it would be a better approach.
msg344716 - (view)	Author: Andrei Zene (andzn) *	Date: 2019-06-05 12:53
> Would it be possible to make FD_SETSIZE configurable at runtime, at least on Windows? IMHO it would be a better approach. That would be awesome, but it doesn't look like it would be possible. As I have already pointed out in my previous comment: > I'm not sure this can be totally dynamic on Windows. The problem is the fd_set struct which is defined in Winsock2.h > > typedef struct fd_set { > u_int fd_count; /* how many are SET? / > SOCKET fd_array[FD_SETSIZE]; / an array of SOCKETs */ > } fd_set; > > The heap-allocated pylists are converted into fd_set structs which are always allocated on the stack btw. The problem is there's no way to create a fd_set of a desired FD_SETSIZE at runtime.
msg344720 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-06-05 13:32
I believe the number still can be configurable but requires more work. The idea is: FD_* macroses should be replaced with custom fd_set manipulation functions. on select.select() call we can calculate the number of the highest used socket and use this number plus 1 as fd_set structure size. Did not try to prototype it though.
msg344742 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-05 16:07
I wrote a proof-of-concept: PR 13845. ... but now I'm confused. How does select() know the FD_SETSIZE? Is it hardcoded in select() ABI? Does this PR work? Or does it make no sense?
msg344744 - (view)	Author: Steve Dower (steve.dower) *	Date: 2019-06-05 16:21
> How does select() know the FD_SETSIZE? It doesn't have to. The actual number of descriptors is passed in, and the API presumably assumes that FD_SETSIZE is bigger than that (or else the caller would have crashed writing to invalid memory). I don't see why we'd make it a parameter though. If we have all the descriptors, just allocate a big enough array and pass it in?
msg344756 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-06-05 17:58
From my understanding Steve is right.
msg344757 - (view)	Author: Eryk Sun (eryksun) *	Date: 2019-06-05 18:03
> It doesn't have to. The actual number of descriptors is passed in, and To clarify, this is the fd_count in each fd_set, i.e. Winsock uses a counted SOCKET array instead of the bitmask that's found on other platforms. Winsock select() ignores nfds [1]. If Python's select module used a dynamic fd_set in Windows, this entails directly initializing fd_count and fd_array. The FD_SET macro can't be used since it's intended for the static definition and limited to FD_SETSIZE. FD_ZERO, FD_ISSET, and FD_CLR should still work. [1] https://docs.microsoft.com/en-us/windows/desktop/api/winsock2/nf-winsock2-select
msg344758 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2019-06-05 18:04
> The actual number of descriptors is passed in Yes, but: """The nfds parameter is included only for compatibility with Berkeley sockets.""" ... according to https://docs.microsoft.com/en-us/windows/desktop/api/Winsock2/nf-winsock2-select So it's really FD_SETSIZE that matters, and it's a compile-time constant. Note that Anaconda has been doing this for a long time: https://github.com/AnacondaRecipes/python-feedstock/blob/master/recipe/0004-Win32-Change-FD_SETSIZE-from-512-to-2048.patch
msg344760 - (view)	Author: Steve Dower (steve.dower) *	Date: 2019-06-05 18:56
> So it's really FD_SETSIZE that matters, and it's a compile-time constant. The only way this constant can matter is if the array has to end with an empty slot, which doesn't appear to be the case. The pre-compiled function can't tell how you compiled the code calling it, and the only place where FD_SETSIZE is used is in the macros to set them. Dynamic allocation ought to be just fine, and likely faster for big arrays.
msg344813 - (view)	Author: Andrei Zene (andzn) *	Date: 2019-06-06 13:09
Awesome work Victor! Thanks! I'll give it a try. One thing I see mising though (if I understand well) is the allocation of three fd_sets (I posted a comment on github regarding that). Those fd_sets are currently allocated on stack and the array they contain is determined currently by FD_SETSIZE. We'll basically need a similar structure that we can allocate dynamically, and cast it to fd_set when passing it to the select call. I can go ahead and try implement the feedback and test it together with the changes in Victor's PR.
msg344815 - (view)	Author: Eryk Sun (eryksun) *	Date: 2019-06-06 13:59
> Those fd_sets are currently allocated on stack and the array they > contain is determined currently by FD_SETSIZE. We'll basically need > a similar structure that we can allocate dynamically, and cast it to > fd_set when passing it to the select call. This is possible in Windows. Other platforms are limited to a fixed FD_SETSIZE, in which case the old error should be retained: if (index >= (unsigned int)FD_SETSIZE) { PyErr_SetString(PyExc_ValueError, "too many file descriptors in select()"); goto finally; }
msg345127 - (view)	Author: Andrei Zene (andzn) *	Date: 2019-06-10 12:30
I have updated PR 13842(https://github.com/python/cpython/pull/13842) to dynamically alocate the fd_sets on windows. I did some testing on windows with the following two paterns: 1. 10000 transient clients: (open connection, send message, close connection) 2. 1024 permanent clients: (open connection, then in infinite loop: (read message, send message back)) Everything seemed to be fine. I didn't do any testing on Linux. On windows I also had to remove the if that vstinner added to check if the greatest file descriptor is greater than the setsize. That condition always failed for me on Windows. Apparently, according to: https://docs.microsoft.com/en-us/windows/desktop/api/winsock2/nf-winsock2-select, windows ignores nfds and "is included only for compatibility with Berkeley sockets." The documentation also states that the FD_ "macros are compatible with those used in the Berkeley software, but the underlying representation is completely different.", and: "Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque." This is why I chose to determine setsize based on the inputs. If you think that's not going to work, please say why :)
msg345401 - (view)	Author: Nathaniel Smith (njs) *	Date: 2019-06-12 18:38
Traditionally on Unix, sockets are represented by file descriptors. File descriptors are small integers. (POSIX actually mandates the "small" part: whenever a new fd is created, the OS has to assign it the smallest integer value that's not already being used.) So when select was designed, they decided to be "clever" and use a bitmask to represent the FD sets. If all your fds have values less than 64, then you can use a single 64-bit integer to represent any arbitrary subset of them, sweet, it's super efficient. It's also extremely weird. No other API cares about the actual integer value of an fd, they're just opaque tokens. Select is almost unique in being O(highest fd value). Of course this microoptimization stopped making sense decades ago, so poll() was added. The big innovation with poll() is that it takes an array of descriptors like a normal function, instead of this wacky bitmask thing. So its O(number of fds), and it doesn't matter whether you're checking fd #1 or fd #1000. EXCEPT windows has a totally different history. On Windows, sockets are represented as handles. And handles are just like fds, EXCEPT that handles are allowed to have arbitrary values; they didn't copy POSIX's weird (and expensive) rule about alwaysv using the smallest possible integer. So when Windows went to implement select(), the bitmask optimization never made any sense at all – even if you only have 1 socket, its handle might be, like, 0x9f57be3a or something. So you'd need a bitmask with 2*32 entries, which is silly. So on Windows, select() is secretly poll(). They copied the FD_ macros for compatibility, but fd_set is really just an array of opaque values + an explicit length, and you can pass in as many or as few as you want. I know this is mostly rehashing conclusions that are in the thread already but I thought it might make more sense to have it laid out all together.
msg345656 - (view)	Author: Andrei Zene (andzn) *	Date: 2019-06-15 04:17
That's actually a great explanation Nathaniel. Thanks for putting that all together.

History
Date	User	Action	Args
2022-04-11 14:58:39	admin	set	github: 72894
2021-05-11 14:33:14	zach.ware	set	messages: - msg393447
2021-05-11 08:11:23	nicoleallen	set	nosy: + nicoleallen messages: + msg393447
2021-05-01 13:51:59	njs	set	messages: - msg390813
2021-05-01 13:51:31	njs	set	messages: - msg392595
2021-05-01 13:51:11	njs	set	messages: - msg392594
2021-05-01 13:35:01	MollyEskam	set	messages: + msg392595
2021-05-01 13:34:00	MollyEskam	set	nosy: + MollyEskam messages: + msg392594
2021-04-12 03:19:20	alanpreston	set	nosy: + alanpreston messages: + msg390813
2021-03-18 22:20:31	eryksun	set	components: + Extension Modules, - Build versions: + Python 3.10, - Python 3.6, Python 3.7
2021-03-08 20:12:41	vstinner	set	nosy: - vstinner
2021-02-25 17:01:38	eryksun	set	messages: - msg387680
2021-02-25 16:47:38	stefanyuk.ana	set	nosy: + stefanyuk.ana messages: + msg387680
2019-06-15 04:17:42	andzn	set	messages: + msg345656
2019-06-12 18:38:28	njs	set	nosy: + njs messages: + msg345401
2019-06-10 12:30:26	andzn	set	messages: + msg345127
2019-06-06 13:59:34	eryksun	set	messages: + msg344815
2019-06-06 13:09:35	andzn	set	messages: + msg344813
2019-06-05 18:56:16	steve.dower	set	messages: + msg344760
2019-06-05 18:04:26	pitrou	set	nosy: + eryksun
2019-06-05 18:04:19	pitrou	set	nosy: - eryksun messages: + msg344758
2019-06-05 18:03:10	eryksun	set	nosy: + eryksun messages: + msg344757
2019-06-05 17:58:31	asvetlov	set	messages: + msg344756
2019-06-05 16:21:33	steve.dower	set	messages: + msg344744
2019-06-05 16:07:35	vstinner	set	messages: + msg344742
2019-06-05 16:07:03	vstinner	set	pull_requests: + pull_request13722
2019-06-05 13:32:12	asvetlov	set	nosy: + asvetlov messages: + msg344720
2019-06-05 12:53:53	andzn	set	messages: + msg344716
2019-06-05 12:44:38	vstinner	set	messages: + msg344715
2019-06-05 12:42:00	andzn	set	nosy: + andzn messages: + msg344713
2019-06-05 12:41:40	andzn	set	keywords: + patch stage: patch review pull_requests: + pull_request13719
2017-11-06 19:48:06	desbma	set	nosy: + desbma messages: + msg305672
2016-11-17 17:25:07	pitrou	set	nosy: + pitrou messages: + msg281058
2016-11-15 23:42:24	steve.dower	set	messages: + msg280902
2016-11-15 22:57:12	vstinner	set	nosy: + vstinner messages: + msg280901
2016-11-15 22:55:41	zach.ware	set	nosy: + paul.moore, tim.golden, zach.ware, steve.dower components: + Build, Windows
2016-11-15 22:52:38	David Hirschfeld	create