Issue30064
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017-04-13 08:55 by abacabadabacaba, last changed 2022-04-11 14:58 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cancelled-sock-recv-race.patch | fantix, 2020-05-09 22:52 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 10419 | merged | asvetlov, 2018-11-08 22:41 | |
PR 20369 | merged | fantix, 2020-05-24 22:58 | |
PR 20460 | merged | miss-islington, 2020-05-27 19:47 | |
PR 20485 | merged | vstinner, 2020-05-28 13:47 | |
PR 20487 | merged | miss-islington, 2020-05-28 14:09 | |
PR 20494 | merged | fantix, 2020-05-28 16:08 | |
PR 20503 | merged | miss-islington, 2020-05-28 21:56 | |
PR 20868 | merged | fantix, 2020-06-14 07:05 | |
PR 20869 | merged | miss-islington, 2020-06-14 07:49 |
Messages (41) | |||
---|---|---|---|
msg291593 - (view) | Author: Evgeny Kapun (abacabadabacaba) | Date: 2017-04-13 08:55 | |
Code: import asyncio as a import socket as s @a.coroutine def coro(): s1, s2 = s.socketpair() s1.setblocking(False) s2.setblocking(False) try: yield from a.wait_for(loop.sock_recv(s2, 1), 1) except a.TimeoutError: pass yield from loop.sock_sendall(s1, b'\x00') yield s1.close() s2.close() loop = a.get_event_loop() loop.run_until_complete(coro()) Result: Exception in callback BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 1) handle: <Handle BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 1)> Traceback (most recent call last): File "/usr/lib/python3.6/asyncio/events.py", line 127, in _run self._callback(*self._args) File "/usr/lib/python3.6/asyncio/selector_events.py", line 378, in _sock_recv self.remove_reader(fd) File "/usr/lib/python3.6/asyncio/selector_events.py", line 342, in remove_reader return self._remove_reader(fd) File "/usr/lib/python3.6/asyncio/selector_events.py", line 279, in _remove_reader key = self._selector.get_key(fd) File "/usr/lib/python3.6/selectors.py", line 189, in get_key return mapping[fileobj] File "/usr/lib/python3.6/selectors.py", line 70, in __getitem__ fd = self._selector._fileobj_lookup(fileobj) File "/usr/lib/python3.6/selectors.py", line 224, in _fileobj_lookup return _fileobj_to_fd(fileobj) File "/usr/lib/python3.6/selectors.py", line 41, in _fileobj_to_fd raise ValueError("Invalid file descriptor: {}".format(fd)) ValueError: Invalid file descriptor: -1 |
|||
msg326294 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2018-09-24 21:57 | |
See also https://bugs.python.org/issue34795 |
|||
msg329746 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2018-11-12 17:00 | |
New changeset 74387926072abf338a4c1cec1bf0501fc65bbee5 by Andrew Svetlov in branch 'master': bpo-30064: Refactor sock_* asyncio API (#10419) https://github.com/python/cpython/commit/74387926072abf338a4c1cec1bf0501fc65bbee5 |
|||
msg355272 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2019-10-23 22:42 | |
I think this broke asyncio a bit. These two uvloop sock_* API tests no longer pass on asyncio 3.8: * https://github.com/MagicStack/uvloop/blob/master/tests/test_sockets.py#L192 * https://github.com/MagicStack/uvloop/blob/master/tests/test_sockets.py#L238 Andrew, since this was your patch, please take a look. |
|||
msg355273 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2019-10-23 22:58 | |
I'll elevate the status so that we don't forget before 3.8.1 is too close |
|||
msg357989 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-12-07 20:48 | |
> I'll elevate the status so that we don't forget before 3.8.1 is too close Andrew, Yury: ping. 3.8.1 cutoff is approaching. |
|||
msg358011 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2019-12-08 16:37 | |
Not sure if failed uvloop tests are correct. The scenario is like the following: 1. Suppose we have an unblocking socket connected to peer. 2. Create a task for reading data: task = asyncio.create_task(loop.sock_read(sock, 1)) Note, the task is not started yet. 3. Just after it, cancel the task without waiting for the actual cancellation task.cancel() 4. Finally, read from the socket again: data = await loop.sock_read(sock, 1) If I put context switch (await asyncio.sleep(0)) between 3) and 4) *or* replace direct sock_read() call in 4) with creation a task (data = await asyncio.create_task(loop.sock_read(sock, 1))) the cancellation of former read is performed and test passes. I very doubt if any sane code is organizing like this test: start delayed reading, cancel it and read again. The worse, neither previous not current sock_read() implementation doesn't prevent the concurrent reading which basically delivers data in an unpredictable order. Honestly, I'm not sure if we need to provide the concurrency guarantee for such low-level functions. The code built on top of these low-level primitives should handle the cuncurrent access problem IMHO. |
|||
msg358098 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2019-12-09 14:19 | |
Note: this is going to miss Python 3.8.1 as I'm releasing 3.8.1rc1 right now. Please address this before 3.8.2 (due in February). |
|||
msg358108 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2019-12-09 14:59 | |
As I wrote, I don't think that we should fix the code but the failed uvloop test is slightly weird instead. Maybe I'm wrong, I need to communicate with Yuri anyway. |
|||
msg362637 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2020-02-25 12:10 | |
Downgrading priority on this one. |
|||
msg362671 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2020-02-25 23:38 | |
> I very doubt if any sane code is organizing like this test: start delayed reading, cancel it and read again. Hm, cancellation should work correctly no matter how "sane" or "insane" the user code is. > The worse, neither previous not current sock_read() implementation doesn't prevent the concurrent reading which basically delivers data in an unpredictable order. But we're not discussing using a socket concurrently -- asyncio explicitly does not support that for the sock_ api. AFAICT this issue is about consequent cancel operation not working as expected in asyncio, no? |
|||
msg362690 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2020-02-26 12:39 | |
This is a very interesting question. In asyncio, task cancellation is not executed on `task.cancel()` call immediately but the task is *marked* for cancellation. The actual cancellation requires a context switch (await asyncio.sleep(0) or similar) to propagate `asyncio.CancelledError` into the task code and unwind the exception. As I said, the test needs a context switch between `cancel()` call and analyzing the state after the task cancellation. Futures are canceled immediately, that's why the previous implementation passed uvloop's test suite untouched. Unfortunately, it had own flaws. Also please note, sock_connect()/sock_accept() were implemented this way about two years before my changes, they also suffer from cancel-without-actual-cancellation problems as well. |
|||
msg368553 - (view) | Author: Fantix King (fantix) * | Date: 2020-05-09 22:52 | |
Following the discussion - looks like `add_reader()` will cancel the previous `handle` on the same `fd` if found, so I think it is reasonable to solve this cancellation-race issue by checking if the `handle` is cancelled as shown in the attached patch (tests in uvloop passed with this patch). WDTY? |
|||
msg368701 - (view) | Author: Joongi Kim (achimnol) * | Date: 2020-05-12 05:13 | |
I just encountered this issue when doing "sys.exit(1)" on a Click-based CLI program that internally uses asyncio event loop via wrapped via a context manager, on Python 3.8.2. Using uvloop or adding "time.sleep(0.1)" before "sys.exit(1)" removes the error. |
|||
msg368702 - (view) | Author: Joongi Kim (achimnol) * | Date: 2020-05-12 05:17 | |
And I suspect that this issue is something simliar to what I did in a recent janus PR: https://github.com/aio-libs/janus/blob/ec8592b91254971473b508313fb91b01623f13d7/janus/__init__.py#L84 to give a chance for specific callbacks to execute via an extra context switch. |
|||
msg370123 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2020-05-27 19:47 | |
New changeset 210a137396979d747c2602eeef46c34fc4955448 by Fantix King in branch 'master': bpo-30064: Fix asyncio loop.sock_* race condition issue (#20369) https://github.com/python/cpython/commit/210a137396979d747c2602eeef46c34fc4955448 |
|||
msg370131 - (view) | Author: miss-islington (miss-islington) | Date: 2020-05-27 20:39 | |
New changeset 3a2667d91e33493ccde113ddf5092afefc3c89fa by Miss Islington (bot) in branch '3.9': bpo-30064: Fix asyncio loop.sock_* race condition issue (GH-20369) https://github.com/python/cpython/commit/3a2667d91e33493ccde113ddf5092afefc3c89fa |
|||
msg370143 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-05-27 22:17 | |
The new test_sock_client_racing() test seems to have... a race condition... AMD64 FreeBSD Non-Debug 3.9: https://buildbot.python.org/all/#/builders/750/builds/11 ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.KqueueEventLoopTests) ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests) ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.SelectEventLoopTests) Example: ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 216, in _basetest_sock_connect_racing await self.loop.sock_connect(sock, addr) File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 500, in sock_connect return await fut File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 505, in _sock_connect sock.connect(address) ConnectionRefusedError: [Errno 61] Connection refused During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 253, in test_sock_client_racing self.loop.run_until_complete(asyncio.wait_for( File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/tasks.py", line 496, in wait_for return fut.result() File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 219, in _basetest_sock_connect_racing await self.loop.sock_connect(sock, addr) File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 500, in sock_connect return await fut File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 505, in _sock_connect sock.connect(address) ConnectionRefusedError: [Errno 61] Connection refused |
|||
msg370147 - (view) | Author: Fantix King (fantix) * | Date: 2020-05-27 22:51 | |
Ouch ... looks like FreeBSD also needs a few more retries than a single retry. I'll test on a FreeBSD and create a PR for that. |
|||
msg370159 - (view) | Author: Fantix King (fantix) * | Date: 2020-05-28 00:50 | |
Not a simple one - FreeBSD is returning ECONNREFUSED immediately when trying to connect to a non-listening port on the loopback address: https://lists.freebsd.org/pipermail/freebsd-current/2005-May/049876.html Then all following connect attempts on the same socket return ECONNREFUSED regardless of whether the port is listening or not. I think I'll have to skip this test on FreeBSD. |
|||
msg370161 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-05-28 02:21 | |
> The new test_sock_client_racing() test seems to have... a race condition... I'm also a bit skeptical about relying on `time.sleep(0.01)` at the end of the loop in `_basetest_sock_recv_racing()`, but I don't have an idea for a better approach at the moment. |
|||
msg370205 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-05-28 13:48 | |
I created https://github.com/python/cpython/pull/20485 to skip the unstable tests until someone fix them. IMO it's better to skip the test than reverting the fix, since the fix does fix a race condition. The test is still failing on buildbots which makes analysis of buildbot failures more difficult. For example, a SQLite change got 3 different 4 different bugs on buildbots: https://github.com/python/cpython/pull/20448#issuecomment-635350054 See the "revert on fail" policy for buildbots: https://pythondev.readthedocs.io/ci.html#revert-on-fail > I'm also a bit skeptical about relying on `time.sleep(0.01)` at the end of the loop in `_basetest_sock_recv_racing()` Yeah, please avoid sleep as a synchronization primitive: https://pythondev.readthedocs.io/unstable_tests.html#don-t-use-sleep-as-synchronization |
|||
msg370215 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-05-28 14:08 | |
New changeset 84ee7e1573d166fe7a9be676813e12523b62ab24 by Victor Stinner in branch 'master': bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485) https://github.com/python/cpython/commit/84ee7e1573d166fe7a9be676813e12523b62ab24 |
|||
msg370218 - (view) | Author: miss-islington (miss-islington) | Date: 2020-05-28 14:28 | |
New changeset 1d82f003678816ff8dd822452ec91669844d2d09 by Miss Islington (bot) in branch '3.9': bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485) https://github.com/python/cpython/commit/1d82f003678816ff8dd822452ec91669844d2d09 |
|||
msg370219 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-05-28 14:29 | |
> bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485) Sorry. You should read "Skip" tests, not "Fix" tests :-p |
|||
msg370234 - (view) | Author: Fantix King (fantix) * | Date: 2020-05-28 16:11 | |
Thanks for the comments! Added PR 20494 to properly fix/skip the test for (hopefully) all platforms. |
|||
msg370268 - (view) | Author: Yury Selivanov (yselivanov) * | Date: 2020-05-28 21:56 | |
New changeset dc4eee9e266267498a6b783a0abccc23c06f2b87 by Fantix King in branch 'master': bpo-30064: Properly skip unstable loop.sock_connect() racing test (GH-20494) https://github.com/python/cpython/commit/dc4eee9e266267498a6b783a0abccc23c06f2b87 |
|||
msg370269 - (view) | Author: miss-islington (miss-islington) | Date: 2020-05-28 22:17 | |
New changeset 6637bd45163024f2187e40d2cc12c473f78bf5da by Miss Islington (bot) in branch '3.9': bpo-30064: Properly skip unstable loop.sock_connect() racing test (GH-20494) https://github.com/python/cpython/commit/6637bd45163024f2187e40d2cc12c473f78bf5da |
|||
msg371240 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-06-10 20:14 | |
Another kind of failure on x86 Gentoo Installed with X 3.x: https://buildbot.python.org/all/#/builders/128/builds/1030 ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.EPollEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing await self.loop.sock_sendall(sock, b'world') File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/selector_events.py", line 460, in sock_sendall return await fut asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/tasks.py", line 507, in wait_for fut.result() asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing self.loop.run_until_complete(asyncio.wait_for( File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/tasks.py", line 509, in wait_for raise exceptions.TimeoutError() from exc asyncio.exceptions.TimeoutError |
|||
msg371247 - (view) | Author: Fantix King (fantix) * | Date: 2020-06-10 20:40 | |
I'm checking now |
|||
msg371467 - (view) | Author: Fantix King (fantix) * | Date: 2020-06-13 19:38 | |
I've been testing on local VMs with Gentoo and Ubuntu, but the issue cannot be reproduced. The error log is not really revealing anything useful, other than that the send was blocked when it was supposed to be successful. As it is not constantly failing on buildbot (https://buildbot.python.org/all/#/builders/128), I don't know if it is worth it to add some more debug code to this test in master and wait for the next hang. Please kindly advise. |
|||
msg371478 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-06-14 00:17 | |
I was able to replicate the failure locally by running the large number of jobs in parallel. We typically use this to test for tricky race conditions, to simulate maximum load: [aeros:~/repos/cpython]$ ./python -m test test_asyncio.test_sock_lowlevel --match test_sock_client_racing -j100 -F -v # [snip] ====================================================================== ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.EPollEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing await self.loop.sock_sendall(sock, b'world') File "/home/aeros/repos/cpython/Lib/asyncio/selector_events.py", line 460, in sock_sendall return await fut asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 507, in wait_for fut.result() asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing self.loop.run_until_complete(asyncio.wait_for( File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 509, in wait_for raise exceptions.TimeoutError() from exc asyncio.exceptions.TimeoutError ====================================================================== ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing await self.loop.sock_sendall(sock, b'world') File "/home/aeros/repos/cpython/Lib/asyncio/selector_events.py", line 460, in sock_sendall return await fut asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 507, in wait_for fut.result() asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing self.loop.run_until_complete(asyncio.wait_for( File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 509, in wait_for raise exceptions.TimeoutError() from exc asyncio.exceptions.TimeoutError ====================================================================== ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing self.loop.run_until_complete(asyncio.wait_for( File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 629, in run_until_complete self.run_forever() File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 596, in run_forever self._run_once() File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 1854, in _run_once event_list = self._selector.select(timeout) File "/home/aeros/repos/cpython/Lib/selectors.py", line 323, in select r, w, _ = self._select(self._readers, self._writers, [], timeout) OSError: [Errno 9] Bad file descriptor I'll try to add some debug lugs to find the source of the failure(s). This one might be a bit tricky to assess though. @Fantix I'd recommend trying to run the local test with the above command, and increasing the parallel job count until it occurs. If it takes more than a minute or two for the tests to run, slowly lower the number. At the moment, the failure has only occurred once in the buildbots, but we might need to temporarily disable it if it occurs repeatedly or find some form of workaround (as it will potentially mask other bugs). |
|||
msg371479 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-06-14 01:14 | |
I believe that I might have identified a simple fix for the issue, but first I'd like to clarify on something: What exactly is the reason that we need to use asyncio.wait_for() with a timeout of 10 seconds in these tests? We typically avoid using short duration timeouts on individual tests, largely because we don't want them to fail based on the performance of the device they're ran on. This can be a concern for slower devices running the test suite w/ parallel jobs. If asyncio.wait_for() is necessary in this case, I'd advise increasing the timeout substantially. I didn't experiment with a wide variety of numbers, but increasing the duration from 10 to 60 seconds seems to have fixed the failure locally, so that might be a decent starting point. If it's not necessary, I'd prefer to simply remove the asyncio.wait_for()s in test_sock_client_racing. |
|||
msg371485 - (view) | Author: Fantix King (fantix) * | Date: 2020-06-14 04:06 | |
Oh thank you for the clue, that's super helpful! Let me try locally. This fix PR was originally done to solve a hanging call in a race condition, so the test would hang forever if the fix is not working as expected. The 10 seconds wait_for() is to make sure that we don't end up with a test hanging forever. If enlarging the timeout works in some of your cases, that really inspires me on a potential cause of the issue in the test. To simulate the race condition specifically for loop.sock_sendall(), I needed a connected socket with a full send buffer so that the next sock_sendall() call would block. To achieve this, I wrote something like this: with self.assertRaises(BlockingIOError): while True: sock.send(b' ' * 5) And another loop in a subtask to consume the data: while not data: data = await self.loop.sock_recv(server, 1024) data = data.strip() It might be that it cost too much time in these loops (especially the 2nd one because the logs showed that it was blocked on calling sock_sendall()). But that's just a guess yet, let me add some more debug logs and test locally. |
|||
msg371489 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-06-14 06:20 | |
No problem at all, and thanks for clarifying. > If enlarging the timeout works in some of your cases, that really inspires me on a potential cause of the issue in the test. To simulate the race condition specifically for loop.sock_sendall(), I needed a connected socket with a full send buffer so that the next sock_sendall() call would block. To achieve this, I wrote something like this: > It might be that it cost too much time in these loops (especially the 2nd one because the logs showed that it was blocked on calling sock_sendall()). But that's just a guess yet, let me add some more debug logs and test locally. That seems reasonable to me and would make the most sense at a glance. If this is confirmed to be the source of the issue, we should be able to just reduce the size of the socket's send buffer via SO_SNDBUF before sending to it so that it fills up much more quickly. |
|||
msg371490 - (view) | Author: Fantix King (fantix) * | Date: 2020-06-14 06:31 | |
OK I think I solved the problem by improving the two loops using smarter buffer sizes: # fill the buffer until sending 5 chars would block size = 8192 while size > 5: with self.assertRaises(BlockingIOError): while True: sock.send(b' ' * size) size = int(size / 2) and # receive everything that is not a space async def recv_all(): rv = b'' while True: buf = await self.loop.sock_recv(server, 8192) if not buf: return rv rv += buf.strip() This test is now running ~25x faster, and the load test crashed my laptop a few times before it hits a timeout. Last question before PR pls: given the fact that this test is to cover a fixed case when loop.sock_*() was hanging forever, should I keep the wait_for(..., timeout=10)? |
|||
msg371491 - (view) | Author: Fantix King (fantix) * | Date: 2020-06-14 06:41 | |
and yes, updated the SO_SNDBUF too |
|||
msg371492 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-06-14 06:45 | |
> Last question before PR pls: given the fact that this test is to cover a fixed case when loop.sock_*() was hanging forever, should I keep the wait_for(..., timeout=10)? For now, I think we can keep the asyncio.wait_for() w/ 10 the sec timeout. I'll have to run the tests locally though w/ the patch to double check (once the PR is opened), and it may still end up needing to be increased if the test fails again in the future. It should be okay though. |
|||
msg371495 - (view) | Author: miss-islington (miss-islington) | Date: 2020-06-14 07:44 | |
New changeset 8f04a84755babe516ebb5304904ea7c15b865c80 by Fantix King in branch 'master': bpo-30064: Fix slow asyncio sock test (GH-20868) https://github.com/python/cpython/commit/8f04a84755babe516ebb5304904ea7c15b865c80 |
|||
msg371498 - (view) | Author: miss-islington (miss-islington) | Date: 2020-06-14 08:08 | |
New changeset 83cd968f5cc59ad60b665080087f50fefac17a07 by Miss Islington (bot) in branch '3.9': bpo-30064: Fix slow asyncio sock test (GH-20868) https://github.com/python/cpython/commit/83cd968f5cc59ad60b665080087f50fefac17a07 |
|||
msg371499 - (view) | Author: Kyle Stanley (aeros) * | Date: 2020-06-14 08:13 | |
Thanks again for working on this @Fantix, and for the continued vigilance on the issue after the test failures occurred in the buildbots. I think this recent PR-20868 will do the trick. At the very least, it will make the failure much more rare than before, which is definitely an improvement. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:45 | admin | set | github: 74250 |
2020-06-15 16:37:16 | vstinner | set | nosy:
- vstinner |
2020-06-14 08:13:38 | aeros | set | stage: patch review -> commit review messages: + msg371499 versions: + Python 3.9, Python 3.10, - Python 3.8 |
2020-06-14 08:08:29 | miss-islington | set | messages: + msg371498 |
2020-06-14 07:49:04 | miss-islington | set | pull_requests: + pull_request20058 |
2020-06-14 07:44:10 | miss-islington | set | messages: + msg371495 |
2020-06-14 07:05:37 | fantix | set | pull_requests: + pull_request20057 |
2020-06-14 06:45:04 | aeros | set | messages: + msg371492 |
2020-06-14 06:41:45 | fantix | set | messages: + msg371491 |
2020-06-14 06:31:21 | fantix | set | messages: + msg371490 |
2020-06-14 06:20:47 | aeros | set | messages: + msg371489 |
2020-06-14 04:06:58 | fantix | set | messages: + msg371485 |
2020-06-14 01:14:40 | aeros | set | messages: + msg371479 |
2020-06-14 00:17:45 | aeros | set | messages: + msg371478 |
2020-06-13 19:38:46 | fantix | set | messages: + msg371467 |
2020-06-10 20:40:42 | fantix | set | messages: + msg371247 |
2020-06-10 20:14:49 | vstinner | set | messages: + msg371240 |
2020-05-28 22:17:40 | miss-islington | set | messages: + msg370269 |
2020-05-28 21:56:57 | miss-islington | set | pull_requests: + pull_request19749 |
2020-05-28 21:56:48 | yselivanov | set | messages: + msg370268 |
2020-05-28 16:11:10 | fantix | set | messages: + msg370234 |
2020-05-28 16:08:06 | fantix | set | pull_requests: + pull_request19743 |
2020-05-28 14:29:04 | vstinner | set | messages: + msg370219 |
2020-05-28 14:28:23 | miss-islington | set | messages: + msg370218 |
2020-05-28 14:09:09 | miss-islington | set | pull_requests: + pull_request19736 |
2020-05-28 14:08:57 | vstinner | set | messages: + msg370215 |
2020-05-28 13:48:52 | vstinner | set | messages: + msg370205 |
2020-05-28 13:47:27 | vstinner | set | pull_requests: + pull_request19734 |
2020-05-28 02:21:40 | aeros | set | messages: + msg370161 |
2020-05-28 00:50:44 | fantix | set | messages: + msg370159 |
2020-05-27 22:51:03 | fantix | set | messages: + msg370147 |
2020-05-27 22:17:25 | vstinner | set | nosy:
+ vstinner messages: + msg370143 |
2020-05-27 20:39:07 | miss-islington | set | messages: + msg370131 |
2020-05-27 19:47:47 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request19713 |
2020-05-27 19:47:37 | yselivanov | set | messages: + msg370123 |
2020-05-24 22:58:32 | fantix | set | stage: patch review pull_requests: + pull_request19633 |
2020-05-12 05:28:07 | corona10 | set | nosy:
+ aeros |
2020-05-12 05:17:25 | achimnol | set | messages: + msg368702 |
2020-05-12 05:13:01 | achimnol | set | nosy:
+ achimnol messages: + msg368701 |
2020-05-09 22:52:06 | fantix | set | files:
+ cancelled-sock-recv-race.patch nosy: + fantix messages: + msg368553 |
2020-02-26 12:39:03 | asvetlov | set | messages: + msg362690 |
2020-02-25 23:38:54 | yselivanov | set | messages: + msg362671 |
2020-02-25 12:10:38 | lukasz.langa | set | priority: deferred blocker -> high messages: + msg362637 |
2019-12-09 14:59:33 | asvetlov | set | messages: + msg358108 |
2019-12-09 14:19:33 | lukasz.langa | set | nosy:
+ lukasz.langa messages: + msg358098 |
2019-12-08 16:37:21 | asvetlov | set | messages: + msg358011 |
2019-12-07 20:48:28 | ned.deily | set | nosy:
+ ned.deily messages: + msg357989 |
2019-10-23 22:58:45 | yselivanov | set | priority: normal -> deferred blocker messages: + msg355273 |
2019-10-23 22:42:58 | yselivanov | set | status: closed -> open resolution: fixed -> messages: + msg355272 stage: resolved -> (no value) |
2018-11-12 17:02:04 | asvetlov | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2018-11-12 17:00:25 | asvetlov | set | messages: + msg329746 |
2018-11-08 22:41:33 | asvetlov | set | keywords:
+ patch stage: patch review pull_requests: + pull_request9699 |
2018-10-09 08:57:06 | asvetlov | set | assignee: asvetlov |
2018-09-24 21:57:11 | yselivanov | set | nosy:
+ asvetlov messages: + msg326294 versions: + Python 3.8, - Python 3.5, Python 3.6 |
2017-04-13 08:55:09 | abacabadabacaba | create |