Issue12328
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-06-13 18:53 by sbt, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
test_pipe_poll.py | sbt, 2011-06-13 18:53 | |||
pipe_poll.patch | sbt, 2011-06-13 19:10 | review | ||
pipe_poll_2.patch | sbt, 2011-06-14 22:40 | review | ||
pipe_interruptible.patch | sbt, 2011-06-15 13:17 | |||
sigint_event.patch | sbt, 2011-06-26 20:27 | review | ||
pipe_poll_fix.patch | sbt, 2011-11-20 18:41 | |||
PipeIO.zip | sbt, 2012-01-27 13:58 | |||
pipe_poll_fix.patch | sbt, 2012-02-01 14:50 | |||
pipe_poll_fix.patch | sbt, 2012-03-04 17:43 | review | ||
pipe_poll_fix.patch | sbt, 2012-03-05 14:48 | review |
Messages (29) | |||
---|---|---|---|
msg138268 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-13 18:53 | |
There are some problems with the new Windows overlapped implementation of PipeConnection in the default branch. 1) poll(0) can return False when an empty string is in the pipe: if the next message in the pipe is b"" then PeekNamedPipe() returns (0, 0) which is indistiguishable from the case where the pipe is empty. This affects versions 2.6-3.2 of Python on Windows as well. For old versions I would just document the fact that poll() will fail to see that there is data in the pipe if the next message is an empty string. In practice this is not a big deal since pipes are primarily used for pickled objects which will never be the empty string. 2) _poll() forgets to check self._buffered, so it can return False even if there is buffered data. 3) Even if (2) is fixed, _poll() with a non-zero timeout can mess up message boundaries if the pipe contains messages of length zero or one. To fix this overlapped_GetOverlappedResult() needs to be changed to not swallow and forget ERROR_MORE_DATA errors. 4) In _poll() there is a race condition: if the read operation completes after WaitForMultipleObjects() timesout, but before the operation is cancelled, then the data read will be discarded. 5) The code assumes that if CancelIo()/CancelIoEx() returns successfully then the OVERLAPPED structure and associated buffer may be deallocated immediately. But the documentation says that if CancelIo()/CancelIoEx() succeeds then cancellation has only been *requested*: http://msdn.microsoft.com/en-us/library/aa363792%28v=vs.85%29.aspx If [CancelIoEx()] succeeds, the return value is nonzero. The cancel operation for all pending I/O operations issued by the calling thread for the specified file handle was successfully *requested*. The application must not free or reuse the OVERLAPPED structure associated with the canceled I/O operations until they have *completed*. The thread can use the GetOverlappedResult function to determine when the I/O operations themselves have been completed. We should wait for cancellation to *complete* by using GetOverlappedResult(..., TRUE), otherwise we risk a crash. If the operation was cancelled then FALSE is returned with GetLastError() == ERROR_OPERATION_ABORTED. I would suggest making it a programming error for the overlapped object to be deallocated while the operation is still pending, and to print a RuntimeError if that happens. (This does not prevent us from still trying to prevent a crash using CancelIoEx().) 6) Not a bug exactly, but poll(timeout) is no longer interruptible with Ctrl-C. This also means that Queue.get() is no longer interruptible. -- The unit tests attached pass on Unix. On Windows versions up to 3.2, only test_empty_string fails. On Windows version 3.3 all three fail. |
|||
msg138270 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-13 19:10 | |
The attached patch hopefully fixes problems (1)-(5), but I have never used overlapped I/O before. test_pipe_poll.py passes with these changes. |
|||
msg138285 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-06-13 22:46 | |
Hello, Thanks for the patch. > I would suggest making it a programming error for the overlapped > object to be deallocated while the operation is still pending, and > to print a RuntimeError if that happens. (This does not prevent us > from still trying to prevent a crash using CancelIoEx().) Sounds ok to me. > 6) Not a bug exactly, but poll(timeout) is no longer interruptible > with Ctrl-C. This also means that Queue.get() is no longer > interruptible. There is infrastructure in _multiprocessing to handle Ctrl-C. Look for "sigint_event" in Modules/_multiprocessing/*. This could be the topic of a separate issue and patch. > The unit tests attached pass on Unix. On Windows versions up to 3.2, > only test_empty_string fails. On Windows version 3.3 all three fail. Could you make the tests part of Lib/test/test_multiprocessing.py? Also, what is the rationale for the following change: - elif timeout == 0.0: + elif timeout == 0.0 and nleft != 0: return False |
|||
msg138355 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-14 22:40 | |
> Also, what is the rationale for the following change: > > - elif timeout == 0.0: > + elif timeout == 0.0 and nleft != 0: > return False If PeekNamedPipe() returns (navail, nleft) there are 3 cases: 1) navail > 0: just return True 2) navail = nleft = 0: either pipe is empty or next message is empty so fall through to slow path to find out which. 3) navail = 0, nleft > 0: a non-empty message is coming, but nothing is available yet. The check is a shortcut for case 3, although it probably never occurs. I've removed that check in the new patch, and added the unit tests to test_multiprocessing. > There is infrastructure in _multiprocessing to handle Ctrl-C. Look > for "sigint_event" in Modules/_multiprocessing/*. This could be > the topic of a separate issue and patch. I will look in to it. |
|||
msg138363 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-15 13:17 | |
pipe_interruptible.patch is a patch to support to making poll() interruptible. It applies on top of pipe_poll_2.patch. I am not sure what the guarantees are for when KeyboardInterrupt will be raised. I would have done it a bit differently if I knew a good way to test whether the current thread is the main one. Maybe there should be something like PyThread_is_main() and/or threading.ismain(). |
|||
msg138403 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-06-15 22:29 | |
> pipe_interruptible.patch is a patch to support to making poll() > interruptible. It applies on top of pipe_poll_2.patch. Hmm, it seems to me that it should be done in _poll() instead. Otherwise, recv() will not be interruptible, will it? Also, after looking at this again, it seems sigint_event would be better provided (at the C level) by the Python core, e.g. as _PyOS_SigintEvent. The time module already uses a similar mechanism. By setting the event in the sigint handler itself, rather than in an auxiliary HandlerRoutine callback function, it should also ensure that, when the event is set, the "tripped" bit in signalmodule.c has been set, meaning we wouldn't need to Sleep() anymore. (I also wonder why the event isn't auto-reset. Otherwise, there's a race condition - which is probably harmless in most cases, but still.) |
|||
msg138441 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-16 14:34 | |
> Hmm, it seems to me that it should be done in _poll() instead. > Otherwise, recv() will not be interruptible, will it? Or maybe WaitForMultipleObjects() should be changed to also wait on sigint_event if called by the main thread. > Also, after looking at this again, it seems sigint_event would be > better provided (at the C level) by the Python core, e.g. as > _PyOS_SigintEvent. The time module already uses a similar mechanism. > By setting the event in the sigint handler itself, rather than in an > auxiliary HandlerRoutine callback function, it should also ensure > that, when the event is set, the "tripped" bit in signalmodule.c > has been set, meaning we wouldn't need to Sleep() anymore. > (I also wonder why the event isn't auto-reset. Otherwise, there's a > race condition - which is probably harmless in most cases, but still.) If it is an auto-reset event then we must worry about a non-main thread stealing the event. Maybe _PyOS_SigintEvent() should return NULL if called by a non-main thread, and be documented with a loud usage warning. |
|||
msg138455 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-06-16 15:46 | |
> > Hmm, it seems to me that it should be done in _poll() instead. > > Otherwise, recv() will not be interruptible, will it? > > Or maybe WaitForMultipleObjects() should be changed to also wait on > sigint_event if called by the main thread. Indeed, this may be even better. > If it is an auto-reset event then we must worry about a non-main > thread stealing the event. Maybe _PyOS_SigintEvent() should return > NULL if called by a non-main thread, and be documented with a loud > usage warning. You are right, we need a manual reset *or* we must ensure that every user of _PyOS_SigintEvent only does so from the main thread. |
|||
msg138524 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-17 17:00 | |
> You are right, we need a manual reset *or* we must ensure that every > user of _PyOS_SigintEvent only does so from the main thread. On second thoughts, even using an auto-reset event, resetting the event before waiting is unavoidable. Otherwise you are liable to catch an event that was set a long time ago. |
|||
msg139213 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-26 20:27 | |
sigint_event.patch is a patch to make _multiprocessing.win32.WaitForMultipleObjects interruptible. It applies directly on to default. The patch also adds functions _PyOS_SigintEvent and _PyOS_IsMainThread which are implemented in signalmodule.c and declared in intrcheck.c. _PyOS_SigintEvent returns a manual reset event (cast to void*) which is set whenever SIGINT is received. It is Windows only. _PyOS_IsMainThread returns 0 or 1 according to whether the current thread is the main thread. The time and _multiprocessing modules have been updated to use these functions. Note that WaitForMultipleObjects has a bWaitAll parameter. When this is true, all handles in the array are waited for, and WaitForMultipleObjects is not interruptible. |
|||
msg139215 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-06-26 20:43 | |
I have noticed a few more problems. * Because poll() isn't thread safe on Windows, neither is Queue.empty(). Since a queue's pipe will never contain empty messages, this can be fixed easily by using (a wrapper for) win32.PeekNamedPipe(). * PipeListener/PipeClient have not been updated to use overlapped I/O. * If more than one process is to read from a pipe connection then (even with proper synchronisation) we cannot safely use poll() since it can leave a partial message in the pipe. |
|||
msg148004 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-11-20 18:41 | |
Here is an updated patch (pipe_poll_fix.patch) which should be applied on top of sigint_event.patch. It fixes the problems with PipeConnection.poll() and Queue.empty() and makes PipeListener.accept() use overlapped I/O. This should make all the pipe releated blocking functions/methods interruptible on Windows. |
|||
msg148085 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-11-21 20:36 | |
New changeset 08953a04b2e6 by Antoine Pitrou in branch 'default': Issue #12328: Under Windows, refactor handling of Ctrl-C events and http://hg.python.org/cpython/rev/08953a04b2e6 |
|||
msg148086 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-11-21 20:38 | |
I've re-added the fast uncontended path in semaphore.c and committed sigint_event.patch. Now I'm gonna take a look at pipe_poll_fix.patch... |
|||
msg148158 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-11-23 00:03 | |
> Here is an updated patch (pipe_poll_fix.patch) which should be applied > on top of sigint_event.patch. > > It fixes the problems with PipeConnection.poll() and Queue.empty() and > makes PipeListener.accept() use overlapped I/O. This should make all > the pipe releated blocking functions/methods interruptible on Windows. I have the feeling that if we have to call GetLastError() at the Python level, then there's something wrong with the APIs we're exposing from the C extension. I see you check for ERROR_OPERATION_ABORTED. Is there any situation where this can happen? Also, it seems strange to call ov.cancel() and then ov.GetOverlappedResult(). AFAICT, those two operations should be mutually exclusive: you call the former if e.g. WaitForMultipleObjects raised an exception, the latter otherwise. |
|||
msg148180 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-11-23 11:44 | |
> I have the feeling that if we have to call GetLastError() at the > Python level, then there's something wrong with the APIs we're > exposing from the C extension. > I see you check for ERROR_OPERATION_ABORTED. Is there any situation > where this can happen? There are three "expected" error conditions: ERROR_OPERATION_ABORTED: operation stopped by CancelIo(Ex)() ERROR_MORE_DATA: operation complete, but only got part of the message ERROR_IO_INCOMPLETE: operation still has not finished In the win32 api you need GetLastError() to distinguish between these cases, but maybe we can expose something better. > Also, it seems strange to call ov.cancel() and then > ov.GetOverlappedResult(). AFAICT, those two operations should be > mutually exclusive: you call the former if e.g. WaitForMultipleObjects > raised an exception, the latter otherwise. If WaitForMultipleObjects() returns WAIT_OBJECT_0 + 0 then, as you say, there is no need to call ov.cancel(), but it is harmless to call it if the operation has already completed. The cases aren't really mutually exclusive: if you call ov.cancel() you *must* still do ov.GetOverlappedResult(True) to check for the race where the operation completes after the wait times out, but before ov.cancel() can stop it. (Your original implementation had that bug -- see point (5) in my original bug report.) |
|||
msg148187 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-11-23 16:01 | |
> There are three "expected" error conditions: > > ERROR_OPERATION_ABORTED: operation stopped by CancelIo(Ex)() > ERROR_MORE_DATA: operation complete, but only got part of the message > ERROR_IO_INCOMPLETE: operation still has not finished > > In the win32 api you need GetLastError() to distinguish between these > cases, but maybe we can expose something better. It seems to me that ERROR_OPERATION_ABORTED is a "true" error, and so should raise an exception. > The cases aren't really mutually exclusive: if you call ov.cancel() > you *must* still do ov.GetOverlappedResult(True) to check for the race > where the operation completes after the wait times out, but before > ov.cancel() can stop it. (Your original implementation had that bug > -- see point (5) in my original bug report.) Ah, right. Thanks for the explanation. |
|||
msg148215 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2011-11-23 23:38 | |
> It seems to me that ERROR_OPERATION_ABORTED is a "true" error, and so > should raise an exception. I guess so, although we do expect it whenever poll() times out. What exception would be appropriate? BlockingIOError? TimeoutError? |
|||
msg148218 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-11-24 00:27 | |
> > It seems to me that ERROR_OPERATION_ABORTED is a "true" error, and so > > should raise an exception. > > I guess so, although we do expect it whenever poll() times out. What > exception would be appropriate? BlockingIOError? TimeoutError? I would say either an OSError with the appropriate winerror, or a specific exception (_multiprocessing.win32.OperationAbortedError?). Or perhaps a ValueError. |
|||
msg152095 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2012-01-27 13:58 | |
Quite honestly I don't like the way that polling a pipe reads a partial message from the pipe. If at all possible, polling should not modify the pipe. I think the cleanest thing would be to switch to byte oriented pipes on Windows and create PipeIO which subclasses RawIOBase. (Since socket handles are really just overlapped file handles, PipeIO works for them too as long as closesocket() is used instead of CloseHandle().) On Unix FileIO would be used instead. Then Connection can just be a thin wrapper around a file object. Polling on Windows can then be done by creating a wrapper for an overlapped object which represents a zero length read, and can be used with WaitForMultipleObjects(). This lets us implement a select-like wait() function in Python which works with both sockets and Connection objects. Attached is an extension implementing PipeIO (and the overlapped wrapper), a Python module implementing Connection and wait(), and a test. |
|||
msg152433 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2012-02-01 14:50 | |
I have done an updated patch. (It does *not* switch to using bytes oriented pipes as I suggested in the previous message.) The patch also adds a wait() function with signature wait(object_list, timeout=None) for polling multiple objects at once. On Unix it is just a wrapper for select.select(object_list, [], [], timeout) except that it retries when it gets EINTR. wait() works with "connected" sockets too, although on Windows it does not work for "listening" sockets. The patch removes SentinelReady and changes concurrent.futures to use wait() instead. Polling is now done by issuing zero length overlapped reads. This means that the pipe is not modified except possibly if a zero length message is removed. I changed ReadFile(), WriteFile() and GetOverlappedResult() to return pairs, the second entry of which is zero or an "expected" error code. (Unexpected errors still raise an exception.) This avoids the need to ever use GetLastError(). |
|||
msg154849 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-03-03 20:00 | |
Hmm, I tried to apply the latest patch to the default branch and it failed. It also seems the patch was done against a changeset (508bc675af63) which doesn't exist in the repo... |
|||
msg154894 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2012-03-04 13:49 | |
> Hmm, I tried to apply the latest patch to the default branch and it > failed. It also seems the patch was done against a changeset > (508bc675af63) which doesn't exist in the repo... I will do an updated patch against a "public" changeset. (I usually use a patch for the project files to disable those extensions which my windows setup cannot compile.) |
|||
msg154903 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2012-03-04 17:43 | |
Updated patch against 2822765e48a7. |
|||
msg154950 - (view) | Author: Richard Oudkerk (sbt) * ![]() |
Date: 2012-03-05 14:48 | |
Updated patch addressing Antoine's comments. |
|||
msg154968 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2012-03-05 18:32 | |
New changeset 75c27daa592e by Antoine Pitrou in branch 'default': Issue #12328: Fix multiprocessing's use of overlapped I/O on Windows. http://hg.python.org/cpython/rev/75c27daa592e |
|||
msg154970 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-03-05 18:33 | |
Committed, thanks! |
|||
msg155265 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2012-03-09 21:23 | |
One of our buildbots seems to have recurring issues with the timeout issues. I don't know if the machine is very loaded: ====================================================================== FAIL: test_wait_integer (test.test_multiprocessing.TestWait) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_multiprocessing.py", line 2617, in test_wait_integer self.assertLess(delta, expected + 2) AssertionError: 8.592355012893677 not less than 7 ====================================================================== FAIL: test_wait_timeout (test.test_multiprocessing.TestWait) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_multiprocessing.py", line 2590, in test_wait_timeout self.assertLess(delta, expected + 0.5) AssertionError: 1.532202959060669 not less than 1.5 http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x |
|||
msg214187 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2014-03-20 08:28 | |
New changeset bdad874195d6 by Victor Stinner in branch '3.4': Isuse #12328, #20978: Add _winapi.WAIT_ABANDONED_0 symbol, needed by http://hg.python.org/cpython/rev/bdad874195d6 New changeset 2e4692a762d5 by Victor Stinner in branch 'default': (Merge 3.4) Issue #12328, #20978: Add _winapi.WAIT_ABANDONED_0 symbol, needed http://hg.python.org/cpython/rev/2e4692a762d5 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:18 | admin | set | github: 56537 |
2014-03-20 08:28:43 | python-dev | set | messages: + msg214187 |
2012-03-09 21:23:56 | pitrou | set | messages: + msg155265 |
2012-03-05 18:33:14 | pitrou | set | status: open -> closed resolution: fixed messages: + msg154970 stage: patch review -> resolved |
2012-03-05 18:32:42 | python-dev | set | messages: + msg154968 |
2012-03-05 14:48:47 | sbt | set | files:
+ pipe_poll_fix.patch messages: + msg154950 |
2012-03-04 17:43:28 | sbt | set | files:
+ pipe_poll_fix.patch messages: + msg154903 |
2012-03-04 13:49:36 | sbt | set | messages: + msg154894 |
2012-03-03 20:00:06 | pitrou | set | messages: + msg154849 |
2012-02-01 14:50:31 | sbt | set | files:
+ pipe_poll_fix.patch messages: + msg152433 |
2012-01-27 13:58:50 | sbt | set | files:
+ PipeIO.zip messages: + msg152095 |
2011-11-24 00:27:28 | pitrou | set | messages: + msg148218 |
2011-11-23 23:38:23 | sbt | set | messages: + msg148215 |
2011-11-23 16:01:25 | pitrou | set | messages: + msg148187 |
2011-11-23 11:44:22 | sbt | set | messages: + msg148180 |
2011-11-23 00:03:25 | pitrou | set | messages: + msg148158 |
2011-11-21 20:38:27 | pitrou | set | stage: patch review messages: + msg148086 versions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2 |
2011-11-21 20:36:32 | python-dev | set | nosy:
+ python-dev messages: + msg148085 |
2011-11-20 18:41:04 | sbt | set | files:
+ pipe_poll_fix.patch messages: + msg148004 |
2011-06-26 20:43:35 | sbt | set | messages: + msg139215 |
2011-06-26 20:27:48 | sbt | set | files:
+ sigint_event.patch messages: + msg139213 |
2011-06-17 17:00:37 | sbt | set | messages: + msg138524 |
2011-06-16 15:46:45 | pitrou | set | messages: + msg138455 |
2011-06-16 14:34:21 | sbt | set | messages: + msg138441 |
2011-06-15 22:29:02 | pitrou | set | nosy:
+ tim.golden, brian.curtin messages: + msg138403 |
2011-06-15 13:17:18 | sbt | set | files:
+ pipe_interruptible.patch messages: + msg138363 |
2011-06-14 22:40:26 | sbt | set | files:
+ pipe_poll_2.patch messages: + msg138355 |
2011-06-13 22:46:09 | pitrou | set | messages: + msg138285 |
2011-06-13 19:10:36 | sbt | set | files:
+ pipe_poll.patch keywords: + patch messages: + msg138270 |
2011-06-13 18:53:16 | sbt | create |