This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author njs
Recipients David Hirschfeld, andzn, asvetlov, desbma, eryksun, njs, paul.moore, pitrou, steve.dower, tim.golden, vstinner, zach.ware
Date 2019-06-12.18:38:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1560364708.29.0.91285569918.issue28708@roundup.psfhosted.org>
In-reply-to
Content
Traditionally on Unix, sockets are represented by file descriptors. File descriptors are small integers. (POSIX actually mandates the "small" part: whenever a new fd is created, the OS has to assign it the smallest integer value that's not already being used.)

So when select was designed, they decided to be "clever" and use a bitmask to represent the FD sets. If all your fds have values less than 64, then you can use a single 64-bit integer to represent any arbitrary subset of them, sweet, it's super efficient. It's also extremely weird. No other API cares about the actual integer value of an fd, they're just opaque tokens. Select is almost unique in being O(highest fd value).

Of course this microoptimization stopped making sense decades ago, so poll() was added. The big innovation with poll() is that it takes an array of descriptors like a normal function, instead of this wacky bitmask thing. So its O(number of fds), and it doesn't matter whether you're checking fd #1 or fd #1000.

EXCEPT windows has a totally different history. On Windows, sockets are represented as handles. And handles are just like fds, EXCEPT that handles are allowed to have arbitrary values; they didn't copy POSIX's weird (and expensive) rule about alwaysv using the smallest possible integer.

So when Windows went to implement select(), the bitmask optimization never made any sense at all – even if you only have 1 socket, its handle might be, like, 0x9f57be3a or something. So you'd need a bitmask with 2**32 entries, which is silly.

So on Windows, select() is secretly poll(). They copied the FD_* macros for compatibility, but fd_set is really just an array of opaque values + an explicit length, and you can pass in as many or as few as you want.

I know this is mostly rehashing conclusions that are in the thread already but I thought it might make more sense to have it laid out all together.
History
Date User Action Args
2019-06-12 18:38:28njssetrecipients: + njs, paul.moore, pitrou, vstinner, tim.golden, asvetlov, zach.ware, desbma, eryksun, steve.dower, David Hirschfeld, andzn
2019-06-12 18:38:28njssetmessageid: <1560364708.29.0.91285569918.issue28708@roundup.psfhosted.org>
2019-06-12 18:38:28njslinkissue28708 messages
2019-06-12 18:38:27njscreate