classification
Title: test_epoll fails test_control_and_wait() randomly on aarch64 RHEL8 Refleaks 3.9
Type: Stage: resolved
Components: Tests Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2020-09-17 16:42 by vstinner, last changed 2020-12-17 11:14 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23795 merged vstinner, 2020-12-16 09:17
PR 23813 merged miss-islington, 2020-12-17 07:43
PR 23814 merged miss-islington, 2020-12-17 07:43
Messages (9)
msg377059 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-17 16:42
test_epoll.test_control_and_wait() failed once on aarch64 RHEL8 Refleaks 3.9:
https://buildbot.python.org/all/#/builders/247/builds/32

This buildbot runs test_epoll 6 times in a row and the system load was quite high (10.39).

0:03:01 load avg: 10.39 [ 97/425/1] test_epoll failed -- running: test_multiprocessing_fork (3 min), test_gdb (46.3 sec), test_peg_generator (1 min 25 sec), test_capi (2 min 26 sec), test_weakref (2 min 1 sec), test_statistics (1 min 6 sec), test_multiprocessing_spawn (2 min 44 sec), test_io (1 min 53 sec)
beginning 6 repetitions
123456
...test test_epoll failed -- Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.9.cstratak-RHEL8-aarch64.refleak/build/Lib/test/test_epoll.py", line 199, in test_control_and_wait
    self.assertEqual(events, expected)
AssertionError: Lists differ: [(5, 5)] != [(4, 5), (5, 5)]

First differing element 0:
(5, 5)
(4, 5)

Second list contains 1 additional elements.
First extra element 1:
(5, 5)

- [(5, 5)]
+ [(4, 5), (5, 5)]
msg377060 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-17 16:43
Code of the test with a comment on the line which fails:

    def test_control_and_wait(self):
        client, server = self._connected_pair()

        ep = select.epoll(16)
        ep.register(server.fileno(),
                    select.EPOLLIN | select.EPOLLOUT | select.EPOLLET)
        ep.register(client.fileno(),
                    select.EPOLLIN | select.EPOLLOUT | select.EPOLLET)

        now = time.monotonic()
        events = ep.poll(1, 4)
        then = time.monotonic()
        self.assertFalse(then - now > 0.1, then - now)

        events.sort()
        expected = [(client.fileno(), select.EPOLLOUT),
                    (server.fileno(), select.EPOLLOUT)]
        expected.sort()

        self.assertEqual(events, expected)

        events = ep.poll(timeout=2.1, maxevents=4)
        self.assertFalse(events)

        client.send(b"Hello!")
        server.send(b"world!!!")

        now = time.monotonic()
        events = ep.poll(1, 4)
        then = time.monotonic()
        self.assertFalse(then - now > 0.01)

        events.sort()
        expected = [(client.fileno(), select.EPOLLIN | select.EPOLLOUT),
                    (server.fileno(), select.EPOLLIN | select.EPOLLOUT)]
        expected.sort()

        self.assertEqual(events, expected) # <====== FAIL HERE

        ep.unregister(client.fileno())
        ep.modify(server.fileno(), select.EPOLLOUT)
        now = time.monotonic()
        events = ep.poll(1, 4)
        then = time.monotonic()
        self.assertFalse(then - now > 0.01)

        expected = [(server.fileno(), select.EPOLLOUT)]
        self.assertEqual(events, expected)
msg377061 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-17 16:44
I managed to reproduce the failure in 50 seconds using the command:

$ ./python -m test test_epoll -F -v -j20
(...)
0:00:49 load avg: 4.81 [257/1] test_epoll failed
test_add (test.test_epoll.TestEPoll) ... ok
test_badcreate (test.test_epoll.TestEPoll) ... ok
test_close (test.test_epoll.TestEPoll) ... ok
test_context_manager (test.test_epoll.TestEPoll) ... ok
test_control_and_wait (test.test_epoll.TestEPoll) ... FAIL
test_create (test.test_epoll.TestEPoll) ... ok
test_errors (test.test_epoll.TestEPoll) ... ok
test_fd_non_inheritable (test.test_epoll.TestEPoll) ... ok
test_fromfd (test.test_epoll.TestEPoll) ... ok
test_unregister_closed (test.test_epoll.TestEPoll) ... ok

======================================================================
FAIL: test_control_and_wait (test.test_epoll.TestEPoll)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vstinner/python/master/Lib/test/test_epoll.py", line 199, in test_control_and_wait
    self.assertEqual(events, expected)
AssertionError: Lists differ: [(5, 5)] != [(4, 5), (5, 5)]

First differing element 0:
(5, 5)
(4, 5)

Second list contains 1 additional elements.
First extra element 1:
(5, 5)

- [(5, 5)]
+ [(4, 5), (5, 5)]

----------------------------------------------------------------------

Ran 10 tests in 2.120s

FAILED (failures=1)
test test_epoll failed
(...)
Total duration: 49.3 sec
Tests result: FAILURE
msg377748 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-01 12:34
The test also fails on aarch64 RHEL7 Refleaks 3.x:
https://buildbot.python.org/all/#/builders/131/builds/46

FAIL: test_control_and_wait (test.test_epoll.TestEPoll)
msg383136 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-16 10:16
New changeset 79782fe4f8cf73d7fdf8db02073bbadf7ff817b6 by Victor Stinner in branch 'master':
bpo-41804: Enhance test_epoll.test_control_and_wait() (GH-23795)
https://github.com/python/cpython/commit/79782fe4f8cf73d7fdf8db02073bbadf7ff817b6
msg383137 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-16 10:16
I failed to reproduce the issue. Let's hope that replacing send() with sendall() is enough to fix the issue.
msg383220 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-17 07:43
Oh, test_epoll failed on 3.9. So I backported my change to 3.8 and 3.9, to see if it helps or not.

s390x RHEL8 Refleaks 3.9:
https://buildbot.python.org/all/#builders/326/builds/159

Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.9.edelsohn-rhel8-z.refleak/build/Lib/test/test_epoll.py", line 199, in test_control_and_wait
    self.assertEqual(events, expected)
AssertionError: Lists differ: [(5, 5)] != [(4, 5), (5, 5)]
msg383221 - (view) Author: miss-islington (miss-islington) Date: 2020-12-17 08:03
New changeset e962e3ad225a211b9f9689742db6e9771d07c505 by Miss Islington (bot) in branch '3.9':
bpo-41804: Enhance test_epoll.test_control_and_wait() (GH-23795)
https://github.com/python/cpython/commit/e962e3ad225a211b9f9689742db6e9771d07c505
msg383231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-17 11:14
New changeset 718bf1a7a1a39ca6f2381a299d00d8318732104a by Miss Islington (bot) in branch '3.8':
bpo-41804: Enhance test_epoll.test_control_and_wait() (GH-23795) (GH-23814)
https://github.com/python/cpython/commit/718bf1a7a1a39ca6f2381a299d00d8318732104a
History
Date User Action Args
2020-12-17 11:14:55vstinnersetmessages: + msg383231
2020-12-17 08:03:58miss-islingtonsetmessages: + msg383221
2020-12-17 07:43:32vstinnersetmessages: + msg383220
2020-12-17 07:43:14miss-islingtonsetpull_requests: + pull_request22675
2020-12-17 07:43:06miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request22674
2020-12-16 10:16:59vstinnersetstatus: open -> closed
versions: + Python 3.10, - Python 3.9
messages: + msg383137

resolution: fixed
stage: patch review -> resolved
2020-12-16 10:16:34vstinnersetmessages: + msg383136
2020-12-16 09:17:42vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request22651
2020-10-01 12:34:14vstinnersetmessages: + msg377748
2020-09-17 16:44:32vstinnersetmessages: + msg377061
2020-09-17 16:43:47vstinnersetmessages: + msg377060
2020-09-17 16:42:20vstinnercreate