classification
Title: test_smtplib: intermittent failures on FreeBSD
Type: behavior Stage: needs patch
Components: Tests Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: koobs, r.david.murray, skrah
Priority: normal Keywords: buildbot, patch

Created on 2011-05-29 20:29 by skrah, last changed 2015-06-24 14:43 by skrah.

Files
File name Uploaded Description Edit
issue12210.diff skrah, 2011-05-29 21:00 review
koobs-freebsd10.python.27.build978.log koobs, 2015-06-22 23:26
Messages (11)
msg137217 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-05-29 20:29
With Python 2.7, the FreeBSD AMD64 bot has sporadic failures in
test_smtplib. Since these failures don't occur in other branches,
I wonder if the timeout of 3 seconds is too low in 2.7. The timeout
is 15 seconds in 3.3.


test test_smtplib failed -- Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/test/test_smtplib.py", line 186, in testNOOP
    smtp = smtplib.SMTP(HOST, self.port, local_hostname='localhost', timeout=3)
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 242, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 303, in connect
    (code, msg) = self.getreply()
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 352, in getreply
    raise SMTPServerDisconnected("Connection unexpectedly closed")
SMTPServerDisconnected: Connection unexpectedly closed

--------------------------------------------------------------------

test test_smtplib failed -- Traceback (most recent call last):
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/test/test_smtplib.py", line 206, in testVRFY
    smtp = smtplib.SMTP(HOST, self.port, local_hostname='localhost', timeout=3)
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 242, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 303, in connect
    (code, msg) = self.getreply()
  File "/usr/home/buildbot/buildarea/2.7.krah-freebsd/build/Lib/smtplib.py", line 352, in getreply
    raise SMTPServerDisconnected("Connection unexpectedly closed")
SMTPServerDisconnected: Connection unexpectedly closed
msg137219 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-05-29 21:00
Indeed, increasing the timeout to 15 solves the problem (I ran the
test with the -F option for about 100 times).

Since the timeout has been increased before to deal with flaky tests
(52682bd738a7), I guess it's ok to do the same for 2.7.
msg137221 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-05-29 21:17
Hmm, testVRFY occurs twice, once with timeout=15 and once with
timeout=3. So there might be another reason why the test passes on
3.3 but not on 2.7.

Anyway, increasing the timeout to 15 throughout does solve
(or suppress) the problem.
msg144841 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-10-03 21:25
I haven't seen this in a while, so let's assume it's fixed.
msg144986 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-10-06 09:57
Naturally, as soon as I declare it fixed, it occurs again:

http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%202.7/builds/326
msg180137 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-17 15:04
Since this occurs only on 2.7 and *very* rarely, let's close it.
msg245660 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2015-06-22 23:26
Observed the following test_smtplib failure on koobs-freebsd10, noting that the timeout parameter is still 3 

test test_smtplib failed -- Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd10/build/Lib/test/test_smtplib.py", line 222, in testHELP
    smtp = smtplib.SMTP(HOST, self.port, local_hostname='localhost', timeout=3)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd10/build/Lib/smtplib.py", line 256, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd10/build/Lib/smtplib.py", line 317, in connect
    (code, msg) = self.getreply()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd10/build/Lib/smtplib.py", line 365, in getreply
    + str(e))
SMTPServerDisconnected: Connection unexpectedly closed: timed out[283/401] test_abstract_numbers

Complete log attached.
msg245676 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-23 10:53
Those are in the second (or rather the first, in the file) group of tests, which have a timeout of 3 on python3.  The comment in the test case says that DNS lookup delays can cause timeouts.  Is there any chance that that machine sometimes has problems looking up 'localhost'?

There is no record in this issue of any change being applied to the tests, and I don't see any changes by Stefan in the file.

I suppose there's no real reason not to make all the timeouts 15 on both 2.7 and python3.  It is weird that those tests should take so long to run, but more likely to be an environment problem than a python code problem.  On the other hand, you might want to look in to why this is happening on FreeBSD, it might reveal something interesting.  

The problem with this theory, of course, is that we haven't apparently seen the timeouts on python3.  But I'm not sure what to look at to try to diagnose a difference...the smtplib code isn't that different between the two, so presumably if there is a real difference it is at the level of the socket code.  Since this is as far as we know only a test problem, we should probably just go ahead and increase the timeouts.
msg245741 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-06-24 13:38
If it is a DNS failure, the timeout of 3s indeed looks too low for me.

On a misconfigured machine I'm easily getting timeouts of 20s, (not for
this particular test, just in general, e.g. when using ssh).
msg245743 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-24 14:05
It would have to be a misconfigured machine that doesn't have 'localhost' in /etc/hosts (so that a lookup of localhost goes through DNS).  Or, I suppose, one that prioritizes the resolver over /etc/hosts (is that even possible?)

I would think other tests would fail if DNS has a 20 second latency.  I think having localhost configured in /etc/hosts is a requirement for other tests, so if that really is the problem I'm not sure we should be fixing the test suite here.  On the third hand, it doesn't really matter in this case and we're already using 15 for other tests, so I have no objection to making the test consistent.
msg245745 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-06-24 14:43
I think it's also possible to misconfigure /etc/nsswitch.conf.

When I opened this issue, my buildbot had the FreeBSD default
configuration.  I no longer have any FreeBSD machines, so I
cannot check this now.
History
Date User Action Args
2015-06-24 14:43:48skrahsetmessages: + msg245745
2015-06-24 14:05:12r.david.murraysetmessages: + msg245743
2015-06-24 13:38:52skrahsetmessages: + msg245741
2015-06-23 10:53:38r.david.murraysetmessages: + msg245676
2015-06-22 23:26:44koobssetstatus: closed -> open
files: + koobs-freebsd10.python.27.build978.log


nosy: + koobs
messages: + msg245660
resolution: wont fix ->
stage: resolved -> needs patch
2013-01-17 15:04:07skrahsetstatus: open -> closed
resolution: out of date -> wont fix
messages: + msg180137
2011-10-06 09:57:34skrahsetstatus: closed -> open

messages: + msg144986
2011-10-03 21:25:43skrahsetstatus: open -> closed
resolution: out of date
messages: + msg144841

stage: resolved
2011-05-29 21:17:30skrahsetmessages: + msg137221
2011-05-29 21:00:59skrahsetfiles: + issue12210.diff
keywords: + patch
messages: + msg137219
2011-05-29 20:29:37skrahcreate