classification
Title: test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition
Type: Stage:
Components: Tests Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aeros, vstinner, yselivanov
Priority: normal Keywords:

Created on 2019-10-23 14:44 by vstinner, last changed 2019-10-25 12:53 by aeros.

Messages (6)
msg355223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 14:44
AMD64 FreeBSD Shared 3.x:
https://buildbot.python.org/all/#/builders/371/builds/7

This buildbot worker is known to be very slow.

test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests)
Test coroutine submission from a thread to an event loop ... FAIL

FAIL: test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests)
Test coroutine submission from a thread to an event loop
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd-564d/build/Lib/test/test_asyncio/test_tasks.py", line 3210, in test_run_coroutine_threadsafe_with_timeout
    self.loop.run_until_complete(future)
AssertionError: TimeoutError not raised
msg355227 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 15:01
test_run_coroutine_threadsafe_with_timeout() has a race condition. It relies on "await asyncio.sleep(0.05)" synchronization primitive. That's not reliable.

    async def add(self, a, b, fail=False, cancel=False):
        """Wait 0.05 second and return a + b."""
        await asyncio.sleep(0.05)
        if fail:
            raise RuntimeError("Fail!")
        if cancel:
            asyncio.current_task(self.loop).cancel()
            await asyncio.sleep(0)
        return a + b
msg355228 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-23 15:03
I can reproduce the issue with this patch:

diff --git a/Lib/test/test_asyncio/test_tasks.py b/Lib/test/test_asyncio/test_tasks.py
index dde84b84b1..c94113712a 100644
--- a/Lib/test/test_asyncio/test_tasks.py
+++ b/Lib/test/test_asyncio/test_tasks.py
@@ -3160,7 +3160,7 @@ class RunCoroutineThreadsafeTests(test_utils.TestCase):
 
     async def add(self, a, b, fail=False, cancel=False):
         """Wait 0.05 second and return a + b."""
-        await asyncio.sleep(0.05)
+        await asyncio.sleep(1e-9)
         if fail:
             raise RuntimeError("Fail!")
         if cancel:

and the command:

./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F
msg355360 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-10-25 11:53
I can confirm Victor's method of reproducing the failure consistently, by using asyncio.sleep(1e-9) within `RunCoroutineThreadsafeTests.add()` instead of the current asyncio.sleep(0.05).

I also experimented with adjusting the sleep time, to figure out the "breaking point" where I could no longer run `./python -m test test_asyncio -m test_run_coroutine_threadsafe_with_timeout -v -F` without failures (within ~10,000 tests). From my results, the lowest reliable value was 0.001. At 1e-4, I was able to consistently reproduce the failure reported above:

FAIL: test_run_coroutine_threadsafe_with_timeout (test.test_asyncio.test_tasks.RunCoroutineThreadsafeTests)
Test coroutine submission from a thread to an event loop
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_tasks.py", line 3210, in test_run_coroutine_threadsafe_with_timeout
    self.loop.run_until_complete(future)
AssertionError: TimeoutError not raised

The failure becomes increasingly consistent with lowered time as expected, but at 1e-5 I was able to repeatedly reproduce the failure within 10 iterations of the test. At 1e-4 it took around 3000 iterations before failing (across multiple runs).
msg355361 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-25 12:15
IHMO a test must not depend on time.
msg355365 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-10-25 12:53
> IHMO a test must not depend on time.

I would agree, the tests should optimally not depend on time. My above comment was primarily to confirm that the failure was able to be consistently reproduced, along with the minimum conditions to do so.

Yury had been the one to change it to "asyncio.sleep(0.05)" from the previous "asyncio.sleep(1)" in the following commit: https://github.com/python/cpython/commit/abe9625eeb71e40f042ccfccfe6a4489a6dcdf35 (Nov 13, 2015).

Perhaps he might have some insight or ideas as to how we could improve the test to use a more reliable means of synchronization that has been implemented since that change was made. I'll add him to the nosy list.
History
Date User Action Args
2019-10-25 12:53:20aerossetnosy: + yselivanov
messages: + msg355365
2019-10-25 12:15:57vstinnersetmessages: + msg355361
2019-10-25 11:53:45aerossetnosy: + aeros
messages: + msg355360
2019-10-23 15:12:43vstinnersettitle: test_asyncio: test_run_coroutine_threadsafe_with_timeout() -> test_asyncio: test_run_coroutine_threadsafe_with_timeout() has a race condition
2019-10-23 15:03:54vstinnersetmessages: + msg355228
2019-10-23 15:01:22vstinnersetmessages: + msg355227
2019-10-23 14:44:20vstinnercreate