classification
Title: test_socket: testCongestion() hangs on my Fedora 28
Type: Stage: resolved
Components: Tests Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, ncoghlan, petr.viktorin, vstinner, xtreak, yan12125
Priority: normal Keywords: patch

Created on 2018-09-05 15:01 by vstinner, last changed 2018-09-17 23:01 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9277 merged vstinner, 2018-09-13 18:59
PR 9368 merged miss-islington, 2018-09-17 21:01
PR 9369 merged miss-islington, 2018-09-17 21:01
Messages (9)
msg324643 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-05 15:01
Hi,

test_socket started to hang recently on my Fedora 28 laptop. No idea why it started to hang.

vstinner@apu$ ./python -m test -v test_socket -m testCongestion --timeout=5
== CPython 3.8.0a0 (heads/master-dirty:39487196c8, Sep 4 2018, 23:08:20) [GCC 8.1.1 20180712 (Red Hat 8.1.1-5)]
== Linux-4.17.19-200.fc28.x86_64-x86_64-with-glibc2.26 little-endian
== cwd: /home/vstinner/prog/python/master/build/test_python_29510
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
Run tests sequentially
0:00:00 load avg: 1.34 [1/1] test_socket
testCongestion (test.test_socket.RDSTest) ... Timeout (0:00:05)!
Thread 0x00007fccf51b1700 (most recent call first):
  File "/home/vstinner/prog/python/master/Lib/test/test_socket.py", line 2074 in _testCongestion
  File "/home/vstinner/prog/python/master/Lib/test/test_socket.py", line 332 in clientRun

Thread 0x00007fcd082ee080 (most recent call first):
  File "/home/vstinner/prog/python/master/Lib/threading.py", line 296 in wait
  File "/home/vstinner/prog/python/master/Lib/threading.py", line 552 in wait
  File "/home/vstinner/prog/python/master/Lib/test/test_socket.py", line 2059 in testCongestion
  File "/home/vstinner/prog/python/master/Lib/unittest/case.py", line 610 in run
  File "/home/vstinner/prog/python/master/Lib/unittest/case.py", line 658 in __call__
  File "/home/vstinner/prog/python/master/Lib/unittest/suite.py", line 122 in run
  File "/home/vstinner/prog/python/master/Lib/unittest/suite.py", line 84 in __call__
  File "/home/vstinner/prog/python/master/Lib/unittest/suite.py", line 122 in run
  File "/home/vstinner/prog/python/master/Lib/unittest/suite.py", line 84 in __call__
  File "/home/vstinner/prog/python/master/Lib/unittest/runner.py", line 176 in run
  File "/home/vstinner/prog/python/master/Lib/test/support/__init__.py", line 1900 in _run_suite
  File "/home/vstinner/prog/python/master/Lib/test/support/__init__.py", line 1990 in run_unittest
  File "/home/vstinner/prog/python/master/Lib/test/test_socket.py", line 6032 in test_main
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/runtest.py", line 179 in runtest_inner
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/runtest.py", line 140 in runtest
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/main.py", line 384 in run_tests_sequential
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/main.py", line 488 in run_tests
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/main.py", line 566 in _main
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/main.py", line 531 in main
  File "/home/vstinner/prog/python/master/Lib/test/libregrtest/main.py", line 584 in main
  File "/home/vstinner/prog/python/master/Lib/test/__main__.py", line 2 in <module>
  File "/home/vstinner/prog/python/master/Lib/runpy.py", line 85 in _run_code
  File "/home/vstinner/prog/python/master/Lib/runpy.py", line 193 in _run_module_as_main
msg324668 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-09-06 07:52
It seems there was a similar report pointing to the same line in the test using Fedora 28. Ref : https://bugs.python.org/issue34354

Thanks
msg324674 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-06 08:48
Linux RDS manual page says:
https://linux.die.net/man/7/rds

"The receive queue size limits how much data RDS will put on the receive queue of a socket before marking the socket as congested. When a socket becomes congested, RDS will send a congestion map update to the other participating hosts, who are then expected to stop sending more messages to this port."

=> "other participating hosts (...) are (...) expected to stop sending"

By design, it seems like the Python unit test is going to fail, so I suggest to remove the test.

I don't think that the role of Python is to check how the kernel handles congestion on local RDS sockets.
msg325248 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-09-13 14:34
Same problem here. However, checking the test code, it seems that what's happening is that even though the sending socket has been put into non-blocking mode, self.cli.sendto in the _testCongestion helper method invoked by the ThreadableTest base class [1] has *not* thrown OSError, and hence the finally clause setting the event has *not* been triggered, and hence the test is hanging.

Neither socket.py nor test_socket.py have changed recently though, so it seems to me that this is either a recent Fedora bug (where the socket is blocking when it shouldn't), or else a Fedora change that has uncovered a latent defect in the socket module code.

[1] https://github.com/python/cpython/blob/master/Lib/test/test_socket.py#L228
msg325286 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-13 19:00
I proposed PR 9277 to remove the test: see the PR for the rationale.

> Neither socket.py nor test_socket.py have changed recently though, so it seems to me that this is either a recent Fedora bug (where the socket is blocking when it shouldn't), or else a Fedora change that has uncovered a latent defect in the socket module code.

IMHO it's a change in the implementation of the RDS protocol in Linux, likely in the kernel.
msg325576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-17 21:01
New changeset 7484bdfd1e2e33fdd2c44dd4ffa044aacd495337 by Victor Stinner in branch 'master':
bpo-34587, test_socket: remove RDSTest.testCongestion() (GH-9277)
https://github.com/python/cpython/commit/7484bdfd1e2e33fdd2c44dd4ffa044aacd495337
msg325579 - (view) Author: miss-islington (miss-islington) Date: 2018-09-17 21:28
New changeset b7f58d7f80f80f0e20cad84773f158a379a19280 by Miss Islington (bot) in branch '3.7':
bpo-34587, test_socket: remove RDSTest.testCongestion() (GH-9277)
https://github.com/python/cpython/commit/b7f58d7f80f80f0e20cad84773f158a379a19280
msg325581 - (view) Author: miss-islington (miss-islington) Date: 2018-09-17 21:40
New changeset 68a8f041051e8387583c66b91c7a3bbda6cf7e63 by Miss Islington (bot) in branch '3.6':
bpo-34587, test_socket: remove RDSTest.testCongestion() (GH-9277)
https://github.com/python/cpython/commit/68a8f041051e8387583c66b91c7a3bbda6cf7e63
msg325594 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-17 23:01
I removed the test from Python 3.6, 3.7 and master.
History
Date User Action Args
2018-09-17 23:01:53vstinnersetstatus: open -> closed
versions: + Python 3.6, Python 3.7
messages: + msg325594

resolution: fixed
stage: patch review -> resolved
2018-09-17 21:40:25miss-islingtonsetmessages: + msg325581
2018-09-17 21:28:03miss-islingtonsetnosy: + miss-islington
messages: + msg325579
2018-09-17 21:01:54miss-islingtonsetpull_requests: + pull_request8792
2018-09-17 21:01:45miss-islingtonsetpull_requests: + pull_request8791
2018-09-17 21:01:25vstinnersetmessages: + msg325576
2018-09-14 20:58:02petr.viktorinlinkissue34354 superseder
2018-09-13 19:00:35vstinnersetmessages: + msg325286
2018-09-13 18:59:47vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request8708
2018-09-13 14:34:04ncoghlansetnosy: + ncoghlan, petr.viktorin
messages: + msg325248
2018-09-07 13:51:48yan12125setnosy: + yan12125
2018-09-06 08:48:04vstinnersetmessages: + msg324674
2018-09-06 07:52:46xtreaksetnosy: + xtreak
messages: + msg324668
2018-09-05 15:01:19vstinnercreate