classification
Title: test_asyncio: test_start_tls_server_1() fails randomly
Type: behavior Stage: resolved
Components: asyncio Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: yselivanov Nosy List: asvetlov, christian.heimes, miss-islington, ned.deily, terry.reedy, vstinner, yselivanov
Priority: Keywords: patch

Created on 2017-12-30 19:20 by pitrou, last changed 2018-05-29 20:02 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5054 merged yselivanov, 2017-12-30 20:19
PR 7157 closed vstinner, 2018-05-28 15:47
PR 7166 merged yselivanov, 2018-05-28 19:22
PR 7170 merged miss-islington, 2018-05-28 19:50
Messages (34)
msg309244 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-12-30 19:20
It seems test_asyncio fails sporadically on AppVeyor:
https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.10081#L2650

======================================================================
ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py", line 284, in test_start_tls_server_1
    asyncio.wait_for(main(), loop=self.loop, timeout=10))
  File "C:\projects\cpython\lib\asyncio\base_events.py", line 440, in run_until_complete
    return future.result()
  File "C:\projects\cpython\lib\asyncio\tasks.py", line 398, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
======================================================================
FAIL: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\projects\cpython\lib\test\test_asyncio\functional.py", line 43, in tearDown
    self.fail('unexpected calls to loop.call_exception_handler()')
AssertionError: unexpected calls to loop.call_exception_handler()
msg309254 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:21
I'm leaving on a two-weeks vacation today.  To avoid risking breaking the workflow, I'll mask this tests on AppVeyor.  I'll investigate this when I get back.
msg309255 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:40
New changeset 0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a by Yury Selivanov in branch 'master':
bpo-32458: Temporarily mask start-tls proactor test on Windows (#5054)
https://github.com/python/cpython/commit/0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a
msg309256 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-30 20:42
Please keep this issue open.
msg309521 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-01-05 20:41
After fresh update and debug rebuild on master(3.7), with Win10 patched last night, I got
======================================================================
FAIL: test_call_later (test.test_asyncio.test_events.ProactorEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "F:\dev\3x\lib\test\test_asyncio\test_events.py", line 290, in test_call_later
    self.assertTrue(0.08 <= t1-t0 <= 0.8, t1-t0)
AssertionError: False is not true : 0.07800000000861473

I have not seen this before.  Four subsequent runs passed.
msg310405 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-01-22 09:08
Hmm, the scheduled call was executed sooner than expected.
Actually we had 0.1 sec delay, checked for at least 0.08 but actual call time was 0.078

BTW I recall an issue with Windows when call_later() was executed sooner already.
msg313507 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-03-09 21:38
Note, I think this failure is still happening occasionally on the windows7 buildbot; for example:

http://buildbot.python.org/all/#/builders/111/builds/68/steps/3/logs/stdio
http://buildbot.python.org/all/#/builders/111/builds/85/steps/3/logs/stdio
msg317468 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-23 22:36
test_start_tls_server_1() just failed on my Linux. It likely depends on the system load.
msg317638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-24 23:09
It would be nice to fix this bug before Python 3.7.0 final: either skip the test, or fix it. Flaky tests can be very annoying. For example, the full test suite is run to build a Red Hat package. If a single test fails, the package build fails and should be retried whereas it's slow.

I understood that fixing the root cause might require to rewrite the test, so I don't expect a quick fix on this test. The issue is open since last December... Maybe we can skip the test but fix it in Python 3.7.1?
msg317639 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-24 23:10
I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable.
msg317640 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-24 23:12
> I'm OK to skip it for now. Writing functional tests is super hard because some buildbots are super slow and unpredictable.

Would you mind to write a PR to skip the PR? So you will feel guilty and will try to remind to fix it!
msg317643 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-24 23:23
Not at my computer right now, can do it tomorrow.
msg317751 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-26 18:16
Yury, are you still planning to address this?
msg317755 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-26 18:45
Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday.

Andrew, if you have a Windows environment, could you please try to run this test?
msg317758 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-26 18:49
> Last time I looked into this I couldn't reproduce the failures on my Windows 10 VM, so it seems like an AppVeyor-specific problem. I'll take another look on Monday.

But in msg317468 Victor asserts that it sometimes fails on Linux, too?
msg317759 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-26 18:54
> But in msg317468 Victor asserts that it sometimes fails on Linux, too?

Hm, I missed that. I'll definitely take a look.
msg317760 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-05-26 19:22
It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet.
msg317761 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-26 19:40
I have not seen test_asyncio fail on AppVeyor since about Sunday.  
https://ci.appveyor.com/project/python/cpython/history
shows only a couple of failures, which could very well be real.
I have seen test_asyncio fail on Travis since Sunday.

With my local repository 32-bit debug build of master branch:

If I run python -m test.test_asyncio -v,
test_sock_sendfile_not_regular_file (test.test_asyncio.test_proactor_events.ProactorEventLoopUnixSockSendfileTests) ... (and the test process) aborts with a failed debug assertion:
minkernel\crts\ucrt\src\appcrt\lowio\osfinfo.cpp, line 257
  fh >=  0 && (unsigned)fh < (unsigned)_nhandle

If I run python -m test test_asyncio, I get
F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x052DA4D0>
  source=self)
F:\dev\3x\lib\asyncio\sslproto.py:320: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x0476A118>
  source=self)
Exception in callback _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)
handle: <Handle _ProactorReadPipeTransport._loop_reading__data_received(<_OverlappedFuture cancelled>)>
Traceback (most recent call last):
  File "F:\dev\3x\lib\asyncio\events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "F:\dev\3x\lib\asyncio\proactor_events.py", line 222, in _loop_reading__data_received
    self._closing)
AssertionError
test_asyncio passed in 35 sec
1 test OK.

With -v, above is scattered in verbose output, but test_start_tls_server_1 passed both times.

Specifically, I ran the following 10 times with no failures.
f:\dev\3x>python -m test -v -m test_start_tls_server_1 test_asyncio
...
test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLSTests) ... ok
test_start_tls_server_1 (test.test_asyncio.test_sslproto.SelectorStartTLSTests) ... ok
msg317762 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-26 19:52
I presume the above is using the month-old openssl-bin-1.1.0h.
msg317807 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-27 19:40
Terry, you somehow deleted Christian from the nosy list.

Christian,

> It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet.

On Linux or Windows?
msg317808 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-05-27 19:41
Linux
msg317809 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-05-27 20:14
'deleted somehow': Christian added himself while I had this page open, so when I submitted without refreshing, the nosy list did not include him.  When I saw the red warning, I did not notice the nosy list change; I should have.
msg317817 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-27 23:47
Christian, I believe https://github.com/python/cpython/pull/7130/commits/bd17a5593f03e3f2d39a64f5afd3ffb77c97e914 fixes the issue with OpenSSL 1.1.1 (socket.shutdown was missing and asyncio's protocol didn't receive EOF, instead the connection was terminated (as expected though))
msg317833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 09:14
I just reproduced the issue on Linux. Open 3 terminals and run the commands in parallel:

(1) ./python -m test test_asyncio -m test_start_tls_server_1 -F
(2) ./python -m test -j16 -r
(3) ./python -m test -j16 -r

It's a race condition which doesn't depend on the OS, but on the system load.
msg317862 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 15:08
> It's a race condition which doesn't depend on the OS, but on the system load.

Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually.  I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130

Thing is, these are the very first *functional* tests for asyncio. Before it was mostly tested with mocks. So we'll need some adjustment period to learn how slow/loaded the CI servers are and what kind of timeouts are safe to use.
msg317871 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 15:39
> Yeah, I agree. The current timeout for the test is 5 seconds and it tries to transfer 2mb of data, which isn't that much actually.  I'll bump the timeout and reduce the amount of data transferred in https://github.com/python/cpython/pull/7130

What is the purpose of a timeout of 5 seconds? Some buildbots are really slow.

If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds).
msg317872 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 15:40
> If the purpose is to detect when the test hangs: use a timeout of 5 minutes, or at least 1 minute (60 seconds).

OK, sounds good, will do that.
msg317873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 15:50
> OK, sounds good, will do that.

Race condition! I created https://github.com/python/cpython/pull/7157 to increase the timeout from 5 seconds to 60 seconds.
msg317898 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 18:33
https://github.com/python/cpython/pull/7130 is in.  It includes many tests for TLS tests as well as bugfixes to asyncio code.  Hopefully this all will make tests more stable.
msg317907 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-28 19:49
New changeset 8267ea2e84d355f00654dec3ad782fc7b1f680f1 by Yury Selivanov in branch 'master':
bpo-32458: Further tune start_tls tests (#7166)
https://github.com/python/cpython/commit/8267ea2e84d355f00654dec3ad782fc7b1f680f1
msg317918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-28 20:41
While Yury seems unable to reproduce the bug, it's easy for me to reproduce it on Linux. After 2 hours of debugging, I found the root issue: a race condition not in the test, but in asyncio itself! => bpo-33674 "asyncio: race condition in SSLProtocol".
msg317920 - (view) Author: miss-islington (miss-islington) Date: 2018-05-28 20:44
New changeset 4b828467a3fcec0c1947e8326f67b8db12a4f303 by Miss Islington (bot) in branch '3.7':
bpo-32458: Further tune start_tls tests (GH-7166)
https://github.com/python/cpython/commit/4b828467a3fcec0c1947e8326f67b8db12a4f303
msg318055 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-05-29 16:10
Should we close this issue now?
msg318080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-29 20:02
I'm not 100% sure that all issues are fixed, but the tests seem much more reliable yet. I close the issue. If the bug reoccurs, I will reopen the issue or open a new one.
History
Date User Action Args
2018-05-29 20:02:20vstinnersetstatus: open -> closed
priority: deferred blocker ->
versions: + Python 3.8
messages: + msg318080

resolution: fixed
stage: patch review -> resolved
2018-05-29 16:10:17yselivanovsetmessages: + msg318055
2018-05-28 20:44:16miss-islingtonsetnosy: + miss-islington
messages: + msg317920
2018-05-28 20:41:58vstinnersetmessages: + msg317918
2018-05-28 19:50:13miss-islingtonsetpull_requests: + pull_request6805
2018-05-28 19:49:01yselivanovsetmessages: + msg317907
2018-05-28 19:22:06yselivanovsetpull_requests: + pull_request6801
2018-05-28 18:33:54yselivanovsetmessages: + msg317898
2018-05-28 15:50:31giampaolo.rodolasetnosy: - giampaolo.rodola
2018-05-28 15:50:06vstinnersetmessages: + msg317873
2018-05-28 15:47:48vstinnersetpull_requests: + pull_request6793
2018-05-28 15:40:43yselivanovsetmessages: + msg317872
2018-05-28 15:39:57vstinnersetmessages: + msg317871
2018-05-28 15:08:48yselivanovsetmessages: + msg317862
2018-05-28 09:14:56vstinnersetmessages: + msg317833
2018-05-27 23:47:46yselivanovsetmessages: + msg317817
2018-05-27 20:14:25terry.reedysetmessages: + msg317809
2018-05-27 19:41:12christian.heimessetmessages: + msg317808
2018-05-27 19:40:50yselivanovsetnosy: + christian.heimes
messages: + msg317807
2018-05-26 19:52:13terry.reedysetmessages: + msg317762
2018-05-26 19:40:41terry.reedysetnosy: - christian.heimes
messages: + msg317761
2018-05-26 19:22:06christian.heimessetnosy: + christian.heimes
messages: + msg317760
2018-05-26 18:54:49yselivanovsetmessages: + msg317759
2018-05-26 18:49:07ned.deilysetmessages: + msg317758
2018-05-26 18:45:30yselivanovsetmessages: + msg317755
2018-05-26 18:16:12ned.deilysetmessages: + msg317751
2018-05-24 23:23:28yselivanovsetmessages: + msg317643
2018-05-24 23:12:09vstinnersetmessages: + msg317640
2018-05-24 23:10:31yselivanovsetmessages: + msg317639
2018-05-24 23:09:33vstinnersetpriority: high -> deferred blocker

messages: + msg317638
2018-05-23 22:39:21pitrousetnosy: - pitrou
2018-05-23 22:36:38vstinnersetnosy: + vstinner

messages: + msg317468
title: test_asyncio failures on Windows -> test_asyncio: test_start_tls_server_1() fails randomly
2018-03-09 21:38:33ned.deilysetnosy: + ned.deily
messages: + msg313507
2018-01-22 09:08:46asvetlovsetmessages: + msg310405
2018-01-05 20:41:54terry.reedysetnosy: + terry.reedy
messages: + msg309521
2017-12-30 20:42:28yselivanovsetmessages: + msg309256
2017-12-30 20:40:22yselivanovsetmessages: + msg309255
2017-12-30 20:21:38yselivanovsetpriority: normal -> high
assignee: yselivanov
messages: + msg309254

components: + asyncio, - Library (Lib), Tests
2017-12-30 20:19:43yselivanovsetkeywords: + patch
stage: patch review
pull_requests: + pull_request4935
2017-12-30 19:20:50pitroucreate