Issue 45819: Avoid releasing the GIL in nonblocking socket operations

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/89977

classification

Title:	Avoid releasing the GIL in nonblocking socket operations
Type:	performance	Stage:	patch review
Components:	asyncio, IO	Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	asvetlov, christian.heimes, jakirkham, jcristharif, rhettinger, yselivanov
Priority:	normal	Keywords:	patch

Created on 2021-11-16 18:51 by jcristharif, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
bench.py	jcristharif, 2021-11-16 18:51

Pull Requests
URL	Status	Linked	Edit
PR 29579	open	jcristharif, 2021-11-16 18:58

Messages (3)
msg406422 - (view)	Author: Jim Crist-Harif (jcristharif) *	Date: 2021-11-16 18:51
In https://bugs.python.org/issue7946 an issue with how the current GIL interacts with mixing IO and CPU bound work. Quoting this issue: > when an I/O bound thread executes an I/O call, > it always releases the GIL. Since the GIL is released, a CPU bound > thread is now free to acquire the GIL and run. However, if the I/O > call completes immediately (which is common), the I/O bound thread > immediately stalls upon return from the system call. To get the GIL > back, it now has to go through the timeout process to force the > CPU-bound thread to release the GIL again. This issue can come up in any application where IO and CPU bound work are mixed (we've found it to be a cause of performance issues in https://dask.org for example). Fixing the general problem is tricky and likely requires changes to the GIL's internals, but in the specific case of mixing asyncio running in one thread and CPU work happening in background threads, there may be a simpler fix - don't release the GIL if we don't have to. Asyncio relies on nonblocking socket operations, which by definition shouldn't block. As such, releasing the GIL shouldn't be needed for many operations (`send`, `recv`, ...) on `socket.socket` objects provided they're in nonblocking mode (as suggested in https://bugs.python.org/issue7946#msg99477). Likewise, dropping the GIL can be avoided when calling `select` on `selectors.BaseSelector` objects with a timeout of 0 (making it a non-blocking call). I've made a patch (https://github.com/jcrist/cpython/tree/keep-gil-for-fast-syscalls) with these two changes, and run a benchmark (attached) to evaluate the effect of background threads with/without the patch. The benchmark starts an asyncio server in one process, and a number of clients in a separate process. A number of background threads that just spin are started in the server process (configurable by the `-t` flag, defaults to 0), then the server is loaded to measure the RPS. Here are the results: ``` # Main branch $ python bench.py -c1 -t0 Benchmark: clients = 1, msg-size = 100, background-threads = 0 16324.2 RPS $ python bench.py -c1 -t1 Benchmark: clients = 1, msg-size = 100, background-threads = 1 Spinner spun 1.52e+07 cycles/second 97.6 RPS $ python bench.py -c2 -t0 Benchmark: clients = 2, msg-size = 100, background-threads = 0 31308.0 RPS $ python bench.py -c2 -t1 Benchmark: clients = 2, msg-size = 100, background-threads = 1 Spinner spun 1.52e+07 cycles/second 96.2 RPS $ python bench.py -c10 -t0 Benchmark: clients = 10, msg-size = 100, background-threads = 0 47169.6 RPS $ python bench.py -c10 -t1 Benchmark: clients = 10, msg-size = 100, background-threads = 1 Spinner spun 1.54e+07 cycles/second 95.4 RPS # With this patch $ ./python bench.py -c1 -t0 Benchmark: clients = 1, msg-size = 100, background-threads = 0 18201.8 RPS $ ./python bench.py -c1 -t1 Benchmark: clients = 1, msg-size = 100, background-threads = 1 Spinner spun 9.03e+06 cycles/second 194.6 RPS $ ./python bench.py -c2 -t0 Benchmark: clients = 2, msg-size = 100, background-threads = 0 34151.8 RPS $ ./python bench.py -c2 -t1 Benchmark: clients = 2, msg-size = 100, background-threads = 1 Spinner spun 8.72e+06 cycles/second 729.6 RPS $ ./python bench.py -c10 -t0 Benchmark: clients = 10, msg-size = 100, background-threads = 0 53666.6 RPS $ ./python bench.py -c10 -t1 Benchmark: clients = 10, msg-size = 100, background-threads = 1 Spinner spun 5e+06 cycles/second 21838.2 RPS ``` A few comments on the results: - On the main branch, any GIL contention sharply decreases the number of RPS an asyncio server can handle, regardless of the number of clients. This makes sense - any socket operation will release the GIL, and the server thread will have to wait to reacquire it (up to the switch interval), rinse and repeat. So if every request requires 1 recv and 1 send, a server with background GIL contention is stuck at a max of `1 / (2 * switchinterval)` or 200 RPS with default configuration. This effectively prioritizes the background thread over the IO thread, since the IO thread releases the GIL very frequently and the background thread never does. - With the patch, we still see a performance degradation, but the degradation is less severe and improves with the number of clients. This is because with these changes the asyncio thread only releases the GIL when doing a blocking poll for new IO events (or when the switch interval is hit). With low load (1 client), the IO thread becomes idle more frequently and releases the GIL. Under higher load though the event loop frequently still has work to do at the end of a cycle and issues a `selector.select` call with a 0 timeout (nonblocking), avoiding releasing the GIL at all during that loop (note the nonlinear effect of adding more clients). Since the IO thread still releases the GIL sometimes, the background thread still holds the GIL a larger percentage of the time than the IO thread, but the difference between them is less severe than without this patch. I have also tested this patch on a Dask cluster running some real-world problems and found that it did improve performance where IO was throttled due to GIL contention.
msg406431 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2021-11-16 20:10
+1 There is almost no upside for the current behavior.
msg406436 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2021-11-16 21:44
Do POSIX and Windows APIs guarantee that operations on a nonblocking socket can never ever block? We cannot safely keep the GIL unless you can turn "nonblocking socket operations by definition shouldn't block" into "standard guarantees that nonblocking socket operations never block".

History
Date	User	Action	Args
2022-04-11 14:59:52	admin	set	github: 89977
2022-02-02 23:18:52	jakirkham	set	nosy: + jakirkham
2021-11-16 21:44:22	christian.heimes	set	nosy: + christian.heimes messages: + msg406436
2021-11-16 20:10:58	rhettinger	set	nosy: + rhettinger messages: + msg406431
2021-11-16 19:48:30	jcristharif	set	nosy: + asvetlov, yselivanov components: + asyncio, - C API
2021-11-16 18:58:22	jcristharif	set	keywords: + patch stage: patch review pull_requests: + pull_request27822
2021-11-16 18:51:40	jcristharif	create