classification
Title: Lock.acquire() not interruptible on Windows
Type: enhancement Stage:
Components: Library (Lib), Windows Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, josh.r, kristjan.jonsson, paul.moore, pitrou, steve.dower, tim.golden, tim.peters, vstinner, zach.ware
Priority: normal Keywords:

Created on 2017-04-03 14:43 by pitrou, last changed 2017-04-04 09:12 by vstinner.

Messages (4)
msg291072 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-04-03 14:43
On Windows, Lock.acquire() (and other synchronization primitives derived from it, such as queue.Queue) cannot be interrupted with Ctrl-C, which makes it difficult to interrupt a process waiting on such a primitive.

Judging by the code in Python/_thread_nt.h, it should be relatively easy to add such support for the "legacy" semaphore-based implementation (by using WaitForMultipleObjects instead of WaitForSingleObject), but it would be much hairier for the new condition variable-based implementation.

Of course, many other library calls are prone to this limitation (not being interruptible with Ctrl-C on Windows).

See https://github.com/dask/dask/pull/2144#issuecomment-290556996 for original report.
msg291089 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-04-03 18:19
From the linked issue:

> proc.send_signal(CTRL_C_EVENT) raises the KeyboardInterrupt
> in both the original and subprocess on Windows

GenerateConsoleCtrlEvent takes process group IDs, so it should have been used to implement os.killpg, not os.kill. If you call it on a process ID that's attached to the console that isn't also a group ID (i.e. the group leader), then it defaults to group 0, which includes every process that's attached to the console. 

Popen.send_signal should only send CTRL_C_EVENT or CTRL_BREAK_EVENT when the child process was created with CREATE_NEW_PROCESS_GROUP in its creationflags. Also, since kill() falls back on TerminateProcess, for added safety send_signal also needs to call poll() instead of using returncode because Windows reuses process IDs. Also, to avoid confusion, it should be noted that CTRL_C_EVENT will be ignored by a process in a new group unless said process manually enables Ctrl+C handling by calling SetConsoleCtrlHandler(NULL, FALSE). This setting will be inherited by child processes.

In the long run I think it would be better to clarify this by implementing os.killpg on Windows using GenerateConsoleCtrlEvent, with support for CTRL_C_EVENT, CTRL_BREAK_EVENT, SIGINT, and SIGBREAK. Deprecate using os.kill to send console control events. There is no Windows API to send a console control event to a particular process ID.
msg291094 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-04-03 22:11
Alternatively we could use the SleepEx and WaitFor*Ex functions with alertable waits (i.e. using APCs) instead of the SIGINT Event. This avoids having to replace all single-object waits with multiple-object waits, and would even allow calling SleepEx in pysleep. 

That said, issue 29871 proposes to switch to the condition variable and SRW lock implementation, so first it needs to be decided whether to continue to use kernel waits or switch to conditional variables. Or maybe refactor to use condition variables in performance-critical code and otherwise use kernel waits, if that makes sense.

An orthogonal improvement is to have the signal handler call CancelSynchronousIo. This would entail handling ERROR_OPERATION_ABORTED in _winapi Readfile, WriteFile, and WaitNamedPipe by calling PyErr_CheckSignals. Also in _Py_Read and _Py_Write, if errno is EINVAL and the last Windows error is ERROR_OPERATION_ABORTED, it could manually set errno to EINTR.

Winsock waits (e.g. select) will remain a problem in any case. Winsock uses alertable waits, so a queued user-mode APC will be executed while it's waiting. But then it just resumes its original wait instead of failing with WSAEINTR.
msg291113 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-04-04 08:27
I am not competent enough to pronounce on the technical detail of what you are proposing, but:

> Or maybe refactor to use condition variables in performance-critical code and otherwise use kernel waits, if that makes sense.

That can make sense IMHO.  Lock and RLock are Python-facing objects, so I'm not sure using high-performance userspace primitives is really important there (after all, people will primarily suffer the evaluation cost of pure Python code so, unless you do something silly such as acquire and release a Python lock in a loop, the acquisition cost doesn't really matter).  OTOH, the GIL may be more performance-critical (and needn't be interrupted), so can use userspace CV primitives.

That will however entail a complication of the internal locking API, since we basically need two separate PyThread lock APIs: an "interruptible lock" API and a "fast lock" API.
History
Date User Action Args
2017-04-04 09:12:10vstinnersetnosy: + vstinner
2017-04-04 08:27:28pitrousetnosy: + tim.peters
messages: + msg291113
2017-04-03 22:11:52eryksunsetmessages: + msg291094
2017-04-03 18:19:01eryksunsetnosy: + eryksun
messages: + msg291089
2017-04-03 17:03:54josh.rsetnosy: + josh.r
2017-04-03 14:43:38pitroucreate