classification
Title: Patch selectmodule.c to support WSAPoll on Windows
Type: enhancement Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: giampaolo.rodola, gvanrossum, jcea, neologix, pitrou, sbt, trent
Priority: normal Keywords: gsoc

Created on 2012-11-18 22:32 by trent, last changed 2013-06-20 13:53 by sbt. This issue is now closed.

Files
File name Uploaded Description Edit
wsapoll.patch trent, 2012-11-18 22:32 review
miminal-wsapoll.patch sbt, 2012-12-03 20:03 review
runtime_wsapoll.patch sbt, 2012-12-04 15:01 review
runtime_wsapoll.patch sbt, 2012-12-16 22:41 review
runtime_wsapoll.patch gvanrossum, 2013-01-20 21:09 review
Messages (32)
msg175927 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2012-11-18 22:32
Attached patch adds select.poll() support on Windows via WSAPoll.

It's hacky; I was curious to see whether or not it could be done, and whether or not tulip's pollster would work with it.

It compiles and works, but doesn't play very nicely with tulip.  Also, just about every lick of code that tests poll() does so in a UNIX-specific way, so it's hard to test.

As with select, WSAPoll() will barf if you feed it anything other than SOCKETs (i.e. it doesn't work against non-socket file descriptors).
msg175929 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-18 23:19
Related post:
http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/
msg175948 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2012-11-19 08:12
On Sun, Nov 18, 2012 at 03:19:19PM -0800, Antoine Pitrou wrote:
> 
> Antoine Pitrou added the comment:
> 
> Related post:
> http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

    Yeah, came across that yesterday.  Few other relevant links, for the
    records:

        http://social.msdn.microsoft.com/Forums/en/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 (has a code example of what doesn't work)

        http://www.codeproject.com/Articles/140533/The-Differences-Between-Network-Calls-in-Windows-a

        http://blogs.msdn.com/b/wndp/archive/2006/10/26/wsapoll.aspx

        http://curl.haxx.se/mail/lib-2012-10/0038.html
msg176864 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-12-03 20:03
Attached is an alternative patch which only touches selectmodule.c.  It still does not support WinXP.

Note that in this version register() and modify() do not ignore the POLLPRI flag if it was *explicitly* passed.  But I am not sure how best to deal with POLLPRI.
msg176917 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-12-04 15:01
Here is a version which loads WSAPoll at runtime.  Still no tests or docs.
msg177109 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-12-07 18:50
It seems that the return code of WSAPoll() does not include the count of array items with revents == POLLNVAL.  In the case where all of them are POLLNVAL, instead of returning 0 (which usually indicates a timeout) it returns -1 and WSAGetLastError() == WSAENOTSOCK.

This does not match the MSDN documentation which claims that the return code is the number of descriptors for which revents is non-zero.  But it arguably does agree with the FreeBSD and MacOSX man pages which say that it returns the number of descriptors that are "ready for I/O".


BTW, the implementation of select_poll() assumes that the return code of poll() (if non-negative) is equal to the number of non-zero revents fields.  But select_have_broken_poll() considers a MacOSX poll() implementation to be good even in cases where this assumption is not true:

static int select_have_broken_poll(void)
{
    int poll_test;
    int filedes[2];
    struct pollfd poll_struct = { 0, POLLIN|POLLPRI|POLLOUT, 0 };
    if (pipe(filedes) < 0) {
        return 1;
    }
    poll_struct.fd = filedes[0];
    close(filedes[0]);
    close(filedes[1]);
    poll_test = poll(&poll_struct, 1, 0);
    if (poll_test < 0) {
        return 1;
    } else if (poll_test == 0 && poll_struct.revents != POLLNVAL) {
        return 1;
    }
    return 0;
}

Note that select_have_broken_poll() == FALSE if poll_test == 0 and poll_struct.revents == POLLNVAL.
msg177634 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-12-16 22:41
Here is a new version with tests and docs.

Note that the docs do not mention the bug mentioned in

  http://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

Maybe they should?

Note that that bug makes it a bit difficult to use poll with tulip on Windows.  (But one could restrict timeouts to one second and always check outstanding connect attempts using select() when poll() returns.)
msg180256 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-19 19:33
This works well enough (tested in old version of Tulip), right? What's holding it up?
msg180309 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 19:41
Oh, it needs a new patch -- the patch fails to apply in the 3.4
(default) branch.
msg180315 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 21:09
Here's a new version of the patch.  (Will test on Windows next.)
msg180317 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 21:35
That compiles (after hacking the line endings).  One Tulip test fails, PollEventLooptests.testSockClientFail.  But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).
msg180318 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 21:35
That compiles (after hacking the line endings).  One Tulip test fails, PollEventLooptests.testSockClientFail.  But that's probably because the PollSelector class hasn't been adjusted for Windows yet (need to dig this out of the Pollster code that was deleted when switching to neologix's Selector).
msg180322 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-20 21:54
> That compiles (after hacking the line endings).  One Tulip test fails, 
> PollEventLooptests.testSockClientFail.  But that's probably because the 
> PollSelector class hasn't been adjusted for Windows yet (need to dig this 
> out of the Pollster code that was deleted when switching to neologix's 
> Selector).

Sorry I did not deal with this earlier.  I can make the modifications to PollSelector tommorrow.

Just to describe the horrible hack: every time poll() needs to be called we first check if there are any registered async connects.  If so then I first use select([], [], connectors) to detect any failed connections, and then use poll() normally.

This does mean that to detect failed connections we must never use too large a timeout with poll() when there are outstanding connects.  Of course one must decide what is an acceptable maximum timeout -- too short and you might damage battery life, too long and you will not get prompt notification of failures.
msg180325 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 22:05
Ow. How painful. I'll leave this for you to do. Note that this also
requires separating EVENT_WRITE from EVENT_CONNECT -- I am looking
into this now, but I am not sure how far I will get with this.
msg180327 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-20 22:42
(FWIW, I've got the EVENT_CONNECT separation done.)
msg180328 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-20 22:45
Time for a stupid question from someone who doesn't know anything about Windows: if WSAPoll() is really terminally broken, is it really worth the hassle exposing it and warping the API?
AFAICT, FD_SETSIZE is already bumped to 512 on Windows, and Windows select() is limited by the fd_set size, not the maximum descriptor: so what exactly does WSAPoll() bring over select() on Windows?
(Especially if there are plans to support IOCP, wouldn't that make WSAPoll() obsolete?)
msg180345 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-21 17:38
This is a very good question to which I have no good answer.  If it weren't for this, we could probably do away with the distinction between add_writer and add_connector, and a lot of code could be simpler.  (Or is that distinction also needed for IOCP?)
msg180349 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-21 18:40
On 21/01/2013 5:38pm, Guido van Rossum wrote:
> This is a very good question to which I have no good answer.
> If it weren't for this, we could probably do away with the distinction
> between add_writer and add_connector, and a lot of code could be
> simpler.  (Or is that distinction also needed for IOCP?)

The distinction is not needed by IOCP.  I am also not too sure that 
running tulip on WSAPoll() is a good idea, even if the select module 
provides it.

OFF-TOPIC: Although it is not the optimal way of running tulip with 
IOCP, I have managed to implement IocpSelector and IocpSocket classes 
well enough to pass tulip's unittests (except for the ssl one).

I did have to make some changes to the tests: selectors have a 
wrap_socket() method which prepares a socket for use with the selector. 
  On Unix it just returns the socket unchanged, whereas for IocpSelector 
it returns an IocpSocket wrapper.

I also had to make the unittests behave gracefully if there is a 
"spurious wakeup", i.e. the socket is reported as readable, but trying 
to read fails with BlockingIOError.  (Spurious wakeups are possible but 
very rare with select() etc.)

It would be possible to make IocpSelector deal with pipe handles too.
msg180350 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-21 19:00
Thanks -- I am now close to rejecting the WSAPoll() patch, and even
closer to rejecting its use for Tulip on Windows. That would in turn
mean that we should kill add/remove_connector() and also the
EVENT_CONNECT flag in selector.py. Anyone not in favor please speak
up!

Regarding your IOCP changes, that sounds pretty exciting. Richard,
could you check those into the Tulip as a branch? (Maybe a new branch
named 'iocp'.)
msg180353 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-21 19:51
On 21/01/2013 7:00pm, Guido van Rossum wrote:
> Regarding your IOCP changes, that sounds pretty exciting. Richard,
> could you check those into the Tulip as a branch? (Maybe a new branch
> named 'iocp'.)

OK.  It may take me a while to rebase them.
msg180358 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-21 20:41
I have created an iocp branch.
msg180360 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-21 20:55
> I have created an iocp branch.

You could probably report the fixes for spurious notifications in the
default branch.
msg180386 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-22 13:31
It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard.  (They are mentioned in the BUGS section of the man page for select.)

Should I just apply the following patch to the default branch?

diff -r 3ef7f1fe286c tulip/events_test.py
--- a/tulip/events_test.py      Mon Jan 21 18:55:29 2013 -0800
+++ b/tulip/events_test.py      Tue Jan 22 12:09:21 2013 +0000
@@ -200,7 +200,12 @@
         r, w = unix_events.socketpair()
         bytes_read = []
         def reader():
-            data = r.recv(1024)
+            try:
+                data = r.recv(1024)
+            except BlockingIOError:
+                # Spurious readiness notifications are possible
+                # at least on Linux -- see man select.
+                return
             if data:
                 bytes_read.append(data)
             else:
@@ -218,7 +223,12 @@
         r, w = unix_events.socketpair()
         bytes_read = []
         def reader():
-            data = r.recv(1024)
+            try:
+                data = r.recv(1024)
+            except BlockingIOError:
+                # Spurious readiness notifications are possible
+                # at least on Linux -- see man select.
+                return
             if data:
                 bytes_read.append(data)
             else:
msg180393 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-22 14:36
> It appears that Linux's "spurious readiness notifications" are a deliberate deviation from the POSIX standard.  (They are mentioned in the BUGS section of the man page for select.)

I don't think it's a deliberate deviation, but really bugs/limitations
(I can remember at least one occurrence case where a UDP segment would
be received, which triggered a notification, but the segment was
subsequently discarded because of an invalid checksum). AFAICT kernel
developers tried to fix those spurious notifications, but some of them
were quite tricky (see e.g. http://lwn.net/Articles/318264/ for
epoll() patches, and
http://lists.schmorp.de/pipermail/libev/2009q1/000627.html for an
example spurious epoll() notification scenario).

That's something we have to live with (like pthread condition spurious
wakeups), select()/poll()/epoll() are mere hints that the FD is
readable/writable...

Also, in real code you have to be prepared to catch EAGAIN regardless
of spurious notifications: when a FD is reported as read ready, it
just means that there are some data to read. Depending on the
watermark, it could mean that only one byte is available.

So if you want to receive e.g. a large amount of data and the FD is
non-blocking, you can do something like:

"""
    buffer = []
    while True:
        try:
            data = s.recv(8096)
        except BlockingIOError:
            break

        if data is None:
            break
        buffer += data
"""

Otherwise, you'd have to read() only one byte at a time, and go back
to the select()/poll() syscall.

(For write ready, you can obviously have "spurious" notifications if
you try to write more than what is available in the output socket
buffer).

> Should I just apply the following patch to the default branch?

LGTM.
msg180396 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-22 14:56
> Also, in real code you have to be prepared to catch EAGAIN regardless
> of spurious notifications: when a FD is reported as read ready, it
> just means that there are some data to read. Depending on the
> watermark, it could mean that only one byte is available.

If only one byte is available, recv(4096) should simply return a partial result.
msg180397 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-22 15:00
According to Alan Cox

    It's a design decision and a huge performance win. It's one of the areas
    where POSIX read in its strictest form cripples your performance.

See https://lkml.org/lkml/2011/6/18/103

> (For write ready, you can obviously have "spurious" notifications if
> you try to write more than what is available in the output socket
> buffer).

Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)?
msg180406 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-22 16:01
> If only one byte is available, recv(4096) should simply return a partial result.

Of course, but how do you know if there's data left to read without
calling select() again? It's much better to call read() until you get
EAGAIN than calling select() between each read()/write() call.

> Wouldn't you just get a partial write (assuming an AF_INET, SOCK_STREAM socket)?

For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to
write more than PIPE_BUF, although I guess any sensible implementation
doesn't report the pipe write ready if there's less than PIPE_BUF
space left).

> It's a design decision and a huge performance win. It's one of the areas
> where POSIX read in its strictest form cripples your performance.

Yes, he's referring to the fact that there are cases where you could
avoid some spurious notifications, but that would incur a performance
hit: that's exactly the same rationale behind condition variables
spurious wakups: since the user-code must be prepared to handle
spurious notifications, let's take advantage of it.

But there are been various fixes in the past years to avoid spurious
notifications in epoll() for example, because while they allow certain
optimizations in the kernel, spurious wakeups can cost to user-level
applications...

I'm 99% sure that Linux isn't the only OS allowing spurious wakeups,
since it's essentially an unsolvable issue (temporary shortage of
buffer, or the example given by Alan Cox of a pipe with two
readers...).
msg180407 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-22 16:06
> For SOCK_STREAM, yes, not for SOCK_DGRAM (or for a pipe when trying to
> write more than PIPE_BUF, although I guess any sensible implementation
> doesn't report the pipe write ready if there's less than PIPE_BUF
> space left).

That should be of course "when trying to write LESS than PIPE_BUF",
since it's required to be atomic.
msg180410 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 16:27
Short reads/writes are orthogonal to EAGAIN. All the mainline code treats
readiness as a hint only, so tests should too.

--Guido van Rossum (sent from Android phone)
msg180412 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-22 17:09
> For SOCK_STREAM, yes, not for SOCK_DGRAM

I thought SOCK_DGRAM messages just got truncated at the receiving end.
msg180422 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-22 19:04
> I thought SOCK_DGRAM messages just got truncated at the receiving end.

You were referring to partial writes: for a datagram-oriented
protocol, if the datagram can't be sent atomically (in one
send()/write() call), the kernel will return EAGAIN. On the receiving
side, it will get truncated is the buffer is too small.

Going back to the subject: so what do we say, let's just forget about
supporting WSAPoll at all (both in CPython and tulip)?

If we ever choose to export it, I think the least we should do would
be to not export it as select.poll(): since it has - not so subtle -
semantic differences with poll(), code using previously select() on
Windows may silently break when poll() is suddenly available: e.g.
asyncore with use_poll=True would probably deadlock in case of
unreachable host, if WSAPoll doesn't report connect() failures.

When I see the hoops Richard had to go through to make WSAPoll usable
in tulip, my gut feeling is that exposing it wouldn't be making a
favor to poor unsuspecting Windows programmers :-\
msg180424 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-22 19:13
Agreed, it does not sound very useful to support WSAPoll(), neither in
selector.py (which is intended to eventually be turned into
stdlib/select.py) nor in PEP 3156. And then, what other use is there
for it, really?
History
Date User Action Args
2013-06-20 13:53:30sbtsetkeywords: + gsoc, - patch
status: open -> closed
resolution: rejected
stage: resolved
2013-01-22 19:13:38gvanrossumsetmessages: + msg180424
2013-01-22 19:04:08neologixsetmessages: + msg180422
2013-01-22 17:09:02sbtsetmessages: + msg180412
2013-01-22 16:27:37gvanrossumsetmessages: + msg180410
2013-01-22 16:06:31neologixsetmessages: + msg180407
2013-01-22 16:01:54neologixsetmessages: + msg180406
2013-01-22 15:00:56sbtsetmessages: + msg180397
2013-01-22 14:56:23pitrousetmessages: + msg180396
2013-01-22 14:36:41neologixsetmessages: + msg180393
2013-01-22 13:31:10sbtsetmessages: + msg180386
2013-01-21 20:55:57neologixsetmessages: + msg180360
2013-01-21 20:41:56sbtsetmessages: + msg180358
2013-01-21 19:51:22sbtsetmessages: + msg180353
2013-01-21 19:00:18gvanrossumsetmessages: + msg180350
2013-01-21 18:40:31sbtsetmessages: + msg180349
2013-01-21 17:38:50gvanrossumsetmessages: + msg180345
2013-01-20 22:45:57neologixsetnosy: + neologix
messages: + msg180328
2013-01-20 22:42:35gvanrossumsetmessages: + msg180327
2013-01-20 22:05:01gvanrossumsetmessages: + msg180325
2013-01-20 21:54:51sbtsetmessages: + msg180322
2013-01-20 21:35:09gvanrossumsetmessages: + msg180318
2013-01-20 21:35:06gvanrossumsetmessages: + msg180317
2013-01-20 21:09:14gvanrossumsetfiles: + runtime_wsapoll.patch

messages: + msg180315
2013-01-20 19:41:08gvanrossumsetmessages: + msg180309
2013-01-19 19:33:43gvanrossumsetnosy: + gvanrossum
messages: + msg180256
2012-12-16 22:41:35sbtsetfiles: + runtime_wsapoll.patch
type: enhancement
messages: + msg177634

versions: + Python 3.4
2012-12-07 18:50:51sbtsetmessages: + msg177109
2012-12-04 15:01:38sbtsetfiles: + runtime_wsapoll.patch

messages: + msg176917
2012-12-03 20:03:37sbtsetfiles: + miminal-wsapoll.patch
nosy: + sbt
messages: + msg176864

2012-11-27 16:57:34giampaolo.rodolasetnosy: + giampaolo.rodola
2012-11-22 01:31:21jceasetnosy: + jcea
2012-11-19 08:12:40trentsetmessages: + msg175948
2012-11-18 23:19:19pitrousetnosy: + pitrou
messages: + msg175929
2012-11-18 22:32:54trentcreate