classification
Title: testConnectTimeout of test_timeout TCPTimeoutTestCasefailures fails intermittently
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: neologix Nosy List: koobs, ncoghlan, ned.deily, neologix, pitrou, python-dev, serhiy.storchaka, trent, vstinner
Priority: high Keywords: buildbot, patch

Created on 2012-11-12 21:53 by pitrou, last changed 2013-08-16 22:25 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
connect_timeout.diff neologix, 2013-08-03 11:25 review
connect_timeout-1.diff neologix, 2013-08-16 11:53 review
connect_timeout-2.diff neologix, 2013-08-16 18:00 review
Messages (12)
msg175476 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-12 21:53
The RHEL buildbot consistently fails on test_timeout:



======================================================================
FAIL: testConnectTimeout (test.test_timeout.TCPTimeoutTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.coghlan-redhat/build/Lib/test/test_timeout.py", line 222, in testConnectTimeout
    self._sock_operation(1, 0.001, 'connect', self.addr_remote)
  File "/home/buildbot/buildarea/3.x.coghlan-redhat/build/Lib/test/test_timeout.py", line 126, in _sock_operation
    self.assertLess(delta, timeout + self.fuzz)
AssertionError: 5.0571630001068115 not less than 2.001


According to the build history, the failure appeared in http://buildbot.python.org/all/builders/x86%20RHEL%206%203.3/builds/133 , but the changes look unlikely to be related. Did the setup change?
msg192257 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-07-03 21:03
The failure still occurs on different buildbots. Another example:

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%203.3/builds/711/steps/test/logs/stdio

======================================================================
FAIL: testConnectTimeout (test.test_timeout.TCPTimeoutTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.3.koobs-freebsd/build/Lib/test/test_timeout.py", line 222, in testConnectTimeout
    self._sock_operation(1, 0.001, 'connect', self.addr_remote)
  File "/usr/home/buildbot/python/3.3.koobs-freebsd/build/Lib/test/test_timeout.py", line 126, in _sock_operation
    self.assertLess(delta, timeout + self.fuzz)
AssertionError: 5.352813243865967 not less than 2.001
msg194126 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-08-01 22:31
Also fails occasionally on OS X 3.3 and 3.x with much smaller deltas:

======================================================================
FAIL: testConnectTimeout (test.test_timeout.TCPTimeoutTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.3.bolen-tiger/build/Lib/test/test_timeout.py", line 222, in testConnectTimeout
    self._sock_operation(1, 0.001, 'connect', self.addr_remote)
  File "/Users/db3l/buildarea/3.3.bolen-tiger/build/Lib/test/test_timeout.py", line 126, in _sock_operation
    self.assertLess(delta, timeout + self.fuzz)
AssertionError: 2.0828001499176025 not less than 2.001

======================================================================
FAIL: testConnectTimeout (test.test_timeout.TCPTimeoutTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_timeout.py", line 222, in testConnectTimeout
    self._sock_operation(1, 0.001, 'connect', self.addr_remote)
  File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_timeout.py", line 126, in _sock_operation
    self.assertLess(delta, timeout + self.fuzz)
AssertionError: 2.0782082080841064 not less than 2.001
msg194245 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-03 10:44
The problem is that the test passes a DNS address to connect(), which means that it has to perform a name resolution first.
And since there's not timeout on gethostbyname()/getaddrinfo() you can end up well above the timeout.
The hostnames should be resolved beforehand.
msg194249 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-03 11:25
And here's a patch.
msg195143 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-14 12:27
Ah, good point. My only issue with the patch is that www.python.org will be resolved for every test method, even those that don't use remote_addr. This will make test_timeout unnecessarily slower.
msg195147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 12:50
We can do this in the setUpClass() method.

Or we can wrap the resolve_address() method with the @functools.lru_cache() decorator.
msg195239 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-15 09:18
> Or we can wrap the resolve_address() method with the @functools.lru_cache() decorator.

Sounds ok to me.
msg195318 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-16 11:53
Here's an updated patch using the lru_cache decorator.
msg195421 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-16 21:14
LGTM.
msg195423 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-16 21:22
New changeset 5880c7d33e13 by Charles-François Natali in branch '3.3':
Issue #16463: Fix a transient test_timeout failure.
http://hg.python.org/cpython/rev/5880c7d33e13

New changeset 5d4fe1da2c90 by Charles-François Natali in branch 'default':
Issue #16463: Fix a transient test_timeout failure.
http://hg.python.org/cpython/rev/5d4fe1da2c90
msg195435 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-16 22:25
Should be fixed now!
History
Date User Action Args
2013-08-16 22:25:34neologixsetstatus: open -> closed
resolution: fixed
messages: + msg195435

stage: commit review -> resolved
2013-08-16 21:22:16python-devsetnosy: + python-dev
messages: + msg195423
2013-08-16 21:14:02serhiy.storchakasetmessages: + msg195421
stage: patch review -> commit review
2013-08-16 18:00:47neologixsetfiles: + connect_timeout-2.diff
2013-08-16 11:53:39neologixsetfiles: + connect_timeout-1.diff

messages: + msg195318
2013-08-15 09:18:43pitrousetmessages: + msg195239
2013-08-14 12:50:12serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg195147
2013-08-14 12:28:03pitrousetassignee: ncoghlan -> neologix
stage: needs patch -> patch review
2013-08-14 12:27:39pitrousetmessages: + msg195143
2013-08-03 11:25:35neologixsetfiles: + connect_timeout.diff
keywords: + patch
messages: + msg194249
2013-08-03 10:44:31neologixsetnosy: + neologix
messages: + msg194245
2013-08-03 09:21:57koobssetnosy: + koobs
2013-08-01 22:31:08ned.deilysetnosy: + ned.deily, trent
title: test_timeout failure on the RHEL buildbot -> testConnectTimeout of test_timeout TCPTimeoutTestCasefailures fails intermittently
messages: + msg194126

versions: - Python 3.2
stage: needs patch
2013-07-03 21:03:01vstinnersetnosy: + vstinner
messages: + msg192257
2012-11-12 21:53:37pitroucreate