Issue 16507: Patch selectmodule.c to support WSAPoll on Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60711

classification

Title:	Patch selectmodule.c to support WSAPoll on Windows
Type:	enhancement	Stage:	resolved
Components:		Versions:	Python 3.4

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	giampaolo.rodola, gvanrossum, jcea, neologix, pitrou, sbt, trent
Priority:	normal	Keywords:	gsoc

Created on 2012-11-18 22:32 by trent, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
wsapoll.patch	trent, 2012-11-18 22:32		review
miminal-wsapoll.patch	sbt, 2012-12-03 20:03		review
runtime_wsapoll.patch	sbt, 2012-12-04 15:01		review
runtime_wsapoll.patch	sbt, 2012-12-16 22:41		review
runtime_wsapoll.patch	gvanrossum, 2013-01-20 21:09		review

Messages (32)
msg175927 - (view)	Author: Trent Nelson (trent) *	Date: 2012-11-18 22:32
Attached patch adds select.poll() support on Windows via WSAPoll. It's hacky; I was curious to see whether or not it could be done, and whether or not tulip's pollster would work with it. It compiles and works, but doesn't play very nicely with tulip. Also, just about every lick of code that tests poll() does so in a UNIX-specific way, so it's hard to test. As with select, WSAPoll() will barf if you feed it anything other than SOCKETs (i.e. it doesn't work against non-socket file descriptors).
msg175929 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-11-18 23:19
Related post: http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/
msg175948 - (view)	Author: Trent Nelson (trent) *	Date: 2012-11-19 08:12
On Sun, Nov 18, 2012 at 03:19:19PM -0800, Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > Related post: > http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/ Yeah, came across that yesterday. Few other relevant links, for the records: http://social.msdn.microsoft.com/Forums/en/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 (has a code example of what doesn't work) http://www.codeproject.com/Articles/140533/The-Differences-Between-Network-Calls-in-Windows-a http://blogs.msdn.com/b/wndp/archive/2006/10/26/wsapoll.aspx http://curl.haxx.se/mail/lib-2012-10/0038.html
msg176864 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2012-12-03 20:03
Attached is an alternative patch which only touches selectmodule.c. It still does not support WinXP. Note that in this version register() and modify() do not ignore the POLLPRI flag if it was explicitly passed. But I am not sure how best to deal with POLLPRI.
msg176917 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2012-12-04 15:01
Here is a version which loads WSAPoll at runtime. Still no tests or docs.
msg177109 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2012-12-07 18:50
It seems that the return code of WSAPoll() does not include the count of array items with revents == POLLNVAL. In the case where all of them are POLLNVAL, instead of returning 0 (which usually indicates a timeout) it returns -1 and WSAGetLastError() == WSAENOTSOCK. This does not match the MSDN documentation which claims that the return code is the number of descriptors for which revents is non-zero. But it arguably does agree with the FreeBSD and MacOSX man pages which say that it returns the number of descriptors that are "ready for I/O". BTW, the implementation of select_poll() assumes that the return code of poll() (if non-negative) is equal to the number of non-zero revents fields. But select_have_broken_poll() considers a MacOSX poll() implementation to be good even in cases where this assumption is not true: static int select_have_broken_poll(void) { int poll_test; int filedes[2]; struct pollfd poll_struct = { 0, POLLIN\|POLLPRI\|POLLOUT, 0 }; if (pipe(filedes) < 0) { return 1; } poll_struct.fd = filedes[0]; close(filedes[0]); close(filedes[1]); poll_test = poll(&poll_struct, 1, 0); if (poll_test < 0) { return 1; } else if (poll_test == 0 && poll_struct.revents != POLLNVAL) { return 1; } return 0; } Note that select_have_broken_poll() == FALSE if poll_test == 0 and poll_struct.revents == POLLNVAL.
msg177634 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2012-12-16 22:41
Here is a new version with tests and docs. Note that the docs do not mention the bug mentioned in http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/ Maybe they should? Note that that bug makes it a bit difficult to use poll with tulip on Windows. (But one could restrict timeouts to one second and always check outstanding connect attempts using select() when poll() returns.)
msg180256 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-19 19:33
This works well enough (tested in old version of Tulip), right? What's holding it up?
msg180309 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 19:41
Oh, it needs a new patch -- the patch fails to apply in the 3.4 (default) branch.
msg180315 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 21:09
Here's a new version of the patch. (Will test on Windows next.)
msg180317 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 21:35
That compiles (after hacking the line endings). One Tulip test fails, PollEventLooptests.testSockClientFail. But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).
msg180318 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 21:35
That compiles (after hacking the line endings). One Tulip test fails, PollEventLooptests.testSockClientFail. But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).
msg180322 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-20 21:54
> That compiles (after hacking the line endings). One Tulip test fails, > PollEventLooptests.testSockClientFail. But that's probably because the > PollSelector class hasn't been adjusted for Windows yet (need to dig this > out of the Pollster code that was deleted when switching to neologix's > Selector). Sorry I did not deal with this earlier. I can make the modifications to PollSelector tommorrow. Just to describe the horrible hack: every time poll() needs to be called we first check if there are any registered async connects. If so then I first use select([], [], connectors) to detect any failed connections, and then use poll() normally. This does mean that to detect failed connections we must never use too large a timeout with poll() when there are outstanding connects. Of course one must decide what is an acceptable maximum timeout -- too short and you might damage battery life, too long and you will not get prompt notification of failures.
msg180325 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 22:05
Ow. How painful. I'll leave this for you to do. Note that this also requires separating EVENT_WRITE from EVENT_CONNECT -- I am looking into this now, but I am not sure how far I will get with this.
msg180327 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-20 22:42
(FWIW, I've got the EVENT_CONNECT separation done.)
msg180328 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-20 22:45
Time for a stupid question from someone who doesn't know anything about Windows: if WSAPoll() is really terminally broken, is it really worth the hassle exposing it and warping the API? AFAICT, FD_SETSIZE is already bumped to 512 on Windows, and Windows select() is limited by the fd_set size, not the maximum descriptor: so what exactly does WSAPoll() bring over select() on Windows? (Especially if there are plans to support IOCP, wouldn't that make WSAPoll() obsolete?)
msg180345 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-21 17:38
This is a very good question to which I have no good answer. If it weren't for this, we could probably do away with the distinction between add_writer and add_connector, and a lot of code could be simpler. (Or is that distinction also needed for IOCP?)
msg180349 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-21 18:40
On 21/01/2013 5:38pm, Guido van Rossum wrote: > This is a very good question to which I have no good answer. > If it weren't for this, we could probably do away with the distinction > between add_writer and add_connector, and a lot of code could be > simpler. (Or is that distinction also needed for IOCP?) The distinction is not needed by IOCP. I am also not too sure that running tulip on WSAPoll() is a good idea, even if the select module provides it. OFF-TOPIC: Although it is not the optimal way of running tulip with IOCP, I have managed to implement IocpSelector and IocpSocket classes well enough to pass tulip's unittests (except for the ssl one). I did have to make some changes to the tests: selectors have a wrap_socket() method which prepares a socket for use with the selector. On Unix it just returns the socket unchanged, whereas for IocpSelector it returns an IocpSocket wrapper. I also had to make the unittests behave gracefully if there is a "spurious wakeup", i.e. the socket is reported as readable, but trying to read fails with BlockingIOError. (Spurious wakeups are possible but very rare with select() etc.) It would be possible to make IocpSelector deal with pipe handles too.
msg180350 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-21 19:00
Thanks -- I am now close to rejecting the WSAPoll() patch, and even closer to rejecting its use for Tulip on Windows. That would in turn mean that we should kill add/remove_connector() and also the EVENT_CONNECT flag in selector.py. Anyone not in favor please speak up! Regarding your IOCP changes, that sounds pretty exciting. Richard, could you check those into the Tulip as a branch? (Maybe a new branch named 'iocp'.)
msg180353 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-21 19:51
On 21/01/2013 7:00pm, Guido van Rossum wrote: > Regarding your IOCP changes, that sounds pretty exciting. Richard, > could you check those into the Tulip as a branch? (Maybe a new branch > named 'iocp'.) OK. It may take me a while to rebase them.
msg180358 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-21 20:41
I have created an iocp branch.
msg180360 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-21 20:55
> I have created an iocp branch. You could probably report the fixes for spurious notifications in the default branch.
msg180386 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-22 13:31
It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard. (They are mentioned in the BUGS section of the man page for select.) Should I just apply the following patch to the default branch? diff -r 3ef7f1fe286c tulip/events_test.py --- a/tulip/events_test.py Mon Jan 21 18:55:29 2013 -0800 +++ b/tulip/events_test.py Tue Jan 22 12:09:21 2013 +0000 @@ -200,7 +200,12 @@ r, w = unix_events.socketpair() bytes_read = [] def reader(): - data = r.recv(1024) + try: + data = r.recv(1024) + except BlockingIOError: + # Spurious readiness notifications are possible + # at least on Linux -- see man select. + return if data: bytes_read.append(data) else: @@ -218,7 +223,12 @@ r, w = unix_events.socketpair() bytes_read = [] def reader(): - data = r.recv(1024) + try: + data = r.recv(1024) + except BlockingIOError: + # Spurious readiness notifications are possible + # at least on Linux -- see man select. + return if data: bytes_read.append(data) else:
msg180393 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-22 14:36
> It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard. (They are mentioned in the BUGS section of the man page for select.) I don't think it's a deliberate deviation, but really bugs/limitations (I can remember at least one occurrence case where a UDP segment would be received, which triggered a notification, but the segment was subsequently discarded because of an invalid checksum). AFAICT kernel developers tried to fix those spurious notifications, but some of them were quite tricky (see e.g. http://lwn.net/Articles/318264/ for epoll() patches, and http://lists.schmorp.de/pipermail/libev/2009q1/000627.html for an example spurious epoll() notification scenario). That's something we have to live with (like pthread condition spurious wakeups), select()/poll()/epoll() are mere hints that the FD is readable/writable... Also, in real code you have to be prepared to catch EAGAIN regardless of spurious notifications: when a FD is reported as read ready, it just means that there are some data to read. Depending on the watermark, it could mean that only one byte is available. So if you want to receive e.g. a large amount of data and the FD is non-blocking, you can do something like: """ buffer = [] while True: try: data = s.recv(8096) except BlockingIOError: break if data is None: break buffer += data """ Otherwise, you'd have to read() only one byte at a time, and go back to the select()/poll() syscall. (For write ready, you can obviously have "spurious" notifications if you try to write more than what is available in the output socket buffer). > Should I just apply the following patch to the default branch? LGTM.
msg180396 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-01-22 14:56
> Also, in real code you have to be prepared to catch EAGAIN regardless > of spurious notifications: when a FD is reported as read ready, it > just means that there are some data to read. Depending on the > watermark, it could mean that only one byte is available. If only one byte is available, recv(4096) should simply return a partial result.
msg180397 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-22 15:00
According to Alan Cox It's a design decision and a huge performance win. It's one of the areas where POSIX read in its strictest form cripples your performance. See https://lkml.org/lkml/2011/6/18/103 > (For write ready, you can obviously have "spurious" notifications if > you try to write more than what is available in the output socket > buffer). Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)?
msg180406 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-22 16:01
> If only one byte is available, recv(4096) should simply return a partial result. Of course, but how do you know if there's data left to read without calling select() again? It's much better to call read() until you get EAGAIN than calling select() between each read()/write() call. > Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)? For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to write more than PIPE_BUF, although I guess any sensible implementation doesn't report the pipe write ready if there's less than PIPE_BUF space left). > It's a design decision and a huge performance win. It's one of the areas > where POSIX read in its strictest form cripples your performance. Yes, he's referring to the fact that there are cases where you could avoid some spurious notifications, but that would incur a performance hit: that's exactly the same rationale behind condition variables spurious wakups: since the user-code must be prepared to handle spurious notifications, let's take advantage of it. But there are been various fixes in the past years to avoid spurious notifications in epoll() for example, because while they allow certain optimizations in the kernel, spurious wakeups can cost to user-level applications... I'm 99% sure that Linux isn't the only OS allowing spurious wakeups, since it's essentially an unsolvable issue (temporary shortage of buffer, or the example given by Alan Cox of a pipe with two readers...).
msg180407 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-22 16:06
> For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to > write more than PIPE_BUF, although I guess any sensible implementation > doesn't report the pipe write ready if there's less than PIPE_BUF > space left). That should be of course "when trying to write LESS than PIPE_BUF", since it's required to be atomic.
msg180410 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-22 16:27
Short reads/writes are orthogonal to EAGAIN. All the mainline code treats readiness as a hint only, so tests should too. --Guido van Rossum (sent from Android phone)
msg180412 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-22 17:09
> For SOCK_STREAM, yes, not for SOCK_DGRAM I thought SOCK_DGRAM messages just got truncated at the receiving end.
msg180422 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-22 19:04
> I thought SOCK_DGRAM messages just got truncated at the receiving end. You were referring to partial writes: for a datagram-oriented protocol, if the datagram can't be sent atomically (in one send()/write() call), the kernel will return EAGAIN. On the receiving side, it will get truncated is the buffer is too small. Going back to the subject: so what do we say, let's just forget about supporting WSAPoll at all (both in CPython and tulip)? If we ever choose to export it, I think the least we should do would be to not export it as select.poll(): since it has - not so subtle - semantic differences with poll(), code using previously select() on Windows may silently break when poll() is suddenly available: e.g. asyncore with use_poll=True would probably deadlock in case of unreachable host, if WSAPoll doesn't report connect() failures. When I see the hoops Richard had to go through to make WSAPoll usable in tulip, my gut feeling is that exposing it wouldn't be making a favor to poor unsuspecting Windows programmers :-\
msg180424 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-01-22 19:13
Agreed, it does not sound very useful to support WSAPoll(), neither in selector.py (which is intended to eventually be turned into stdlib/select.py) nor in PEP 3156. And then, what other use is there for it, really?

History
Date	User	Action	Args
2022-04-11 14:57:38	admin	set	github: 60711
2013-06-20 13:53:30	sbt	set	keywords: + gsoc, - patch status: open -> closed resolution: rejected stage: resolved
2013-01-22 19:13:38	gvanrossum	set	messages: + msg180424
2013-01-22 19:04:08	neologix	set	messages: + msg180422
2013-01-22 17:09:02	sbt	set	messages: + msg180412
2013-01-22 16:27:37	gvanrossum	set	messages: + msg180410
2013-01-22 16:06:31	neologix	set	messages: + msg180407
2013-01-22 16:01:54	neologix	set	messages: + msg180406
2013-01-22 15:00:56	sbt	set	messages: + msg180397
2013-01-22 14:56:23	pitrou	set	messages: + msg180396
2013-01-22 14:36:41	neologix	set	messages: + msg180393
2013-01-22 13:31:10	sbt	set	messages: + msg180386
2013-01-21 20:55:57	neologix	set	messages: + msg180360
2013-01-21 20:41:56	sbt	set	messages: + msg180358
2013-01-21 19:51:22	sbt	set	messages: + msg180353
2013-01-21 19:00:18	gvanrossum	set	messages: + msg180350
2013-01-21 18:40:31	sbt	set	messages: + msg180349
2013-01-21 17:38:50	gvanrossum	set	messages: + msg180345
2013-01-20 22:45:57	neologix	set	nosy: + neologix messages: + msg180328
2013-01-20 22:42:35	gvanrossum	set	messages: + msg180327
2013-01-20 22:05:01	gvanrossum	set	messages: + msg180325
2013-01-20 21:54:51	sbt	set	messages: + msg180322
2013-01-20 21:35:09	gvanrossum	set	messages: + msg180318
2013-01-20 21:35:06	gvanrossum	set	messages: + msg180317
2013-01-20 21:09:14	gvanrossum	set	files: + runtime_wsapoll.patch messages: + msg180315
2013-01-20 19:41:08	gvanrossum	set	messages: + msg180309
2013-01-19 19:33:43	gvanrossum	set	nosy: + gvanrossum messages: + msg180256
2012-12-16 22:41:35	sbt	set	files: + runtime_wsapoll.patch type: enhancement messages: + msg177634 versions: + Python 3.4
2012-12-07 18:50:51	sbt	set	messages: + msg177109
2012-12-04 15:01:38	sbt	set	files: + runtime_wsapoll.patch messages: + msg176917
2012-12-03 20:03:37	sbt	set	files: + miminal-wsapoll.patch nosy: + sbt messages: + msg176864
2012-11-27 16:57:34	giampaolo.rodola	set	nosy: + giampaolo.rodola
2012-11-22 01:31:21	jcea	set	nosy: + jcea
2012-11-19 08:12:40	trent	set	messages: + msg175948
2012-11-18 23:19:19	pitrou	set	nosy: + pitrou messages: + msg175929
2012-11-18 22:32:54	trent	create