classification
Title: Sporadic failures in test_urllibnet
Type: behavior Stage:
Components: Tests Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: berker.peksag, georg.brandl, ixokai, martin.panter, orsenthil, pitrou
Priority: normal Keywords: patch

Created on 2010-10-15 13:44 by ixokai, last changed 2016-06-21 13:44 by BreamoreBoy.

Files
File name Uploaded Description Edit
issue10116-nowhitespace.patch ixokai, 2010-10-28 14:10 review
Messages (14)
msg118772 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-15 13:44
Ever since running the snow leopard buildslave, I've had sporadic failures in test_urllibnet. At first I thought it was just a net glitch on my machine or something, as immediately re-running the tests made it go away: but this most recent one:

http://www.python.org/dev//buildbot/builders/x86%20Snow%20Leopard%203.1/builds/20/steps/test/logs/stdio

happened while I was very much monitoring and using the network on the machine for other purposes, and everything was fine in general.

So, I went and looked into test_urllibnet to try to figure out why, and I notice that some of the tests use code to retry on IOErrors, and some don't-- and this test that failed in particular is one that doesn't. 

So: anyone have a better idea of what's going wrong, or is it just that hey, the active network tests are a bit flaky and all should use _open_with_retry instead of just some as is the case now?

[If the latter, I can do a patch]

FWIW, I've only seen this on the 3.1 and 3.x buildslaves, but have seen it on both of those more then once. But I don't know that its a 3.x specific issue: those builds get run more often then the 2.7 one, so have more chances to run into a sporadic issue.
msg118774 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-15 13:51
Rather than the hand-made _open_with_retry, I think it would be better to use support.transient_internet() (it's already used in other tests). Retrying is not very helpful if the other end is down.

Also, EBADF (bad file descriptor) looks like a bug. httplib should not use closed/non-existent file descriptors.
msg118854 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-10-16 03:52
Issue10119 is related too and both I guess, is boiling down to httplib either not properly using an open socket or closing it prematurely.
msg118934 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-10-17 10:56
ixokai, A change made as part of issue10119 should have resolved this issue too. Please let me know if this can be closed.
msg118940 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-17 13:10
I'll run the test in -F mode for a few hours to see if it comes up or not: but its hard for me to say one way or the other if anything has fixed or not fixed it, as the failure only came up every once in awhile. But I'll look.
msg118944 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-17 14:58
Okay, at -r85630 on branches/py3k, I ran:

./python.exe -m test.regrtest -uall -F test_urllibnet

And after 158 retries, got the same error I had before:

test test_urllibnet failed -- Traceback (most recent call last):
  File "/Users/pythonbuildbot/test/build/Lib/urllib/request.py", line 1504, in open
    return getattr(self, name)(url)
  File "/Users/pythonbuildbot/test/build/Lib/urllib/request.py", line 1676, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/Users/pythonbuildbot/test/build/Lib/urllib/request.py", line 1659, in _open_generic_http
    response = http_conn.getresponse()
  File "/Users/pythonbuildbot/test/build/Lib/http/client.py", line 1027, in getresponse
    response.begin()
  File "/Users/pythonbuildbot/test/build/Lib/http/client.py", line 347, in begin
    version, status, reason = self._read_status()
  File "/Users/pythonbuildbot/test/build/Lib/http/client.py", line 303, in _read_status
    line = str(self.fp.readline(), "iso-8859-1")
  File "/Users/pythonbuildbot/test/build/Lib/socket.py", line 267, in readinto
    return self._sock.recv_into(b)
socket.error: [Errno 9] Bad file descriptor
msg119751 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-28 01:52
The attached patch wraps all the calls to the internet in support.transient_internet; I ran it against 3.x and it passed, and then I ran it for quite awhile with the -F option, and encountered one event that I believe would previously had resulted in one of these sporadic failures, and it resulted in a skipped 'resource not available' message. 

I left in the previous 'retry' code, just by virtue of changing as little as possible. I can adjust if its desired.

I believe that transient_internet won't capture EBADF: so if that particular sporadic failure happens again, I'll post up a new issue about it.
msg119786 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010-10-28 14:10
New patch, sans trailing whitespace. Ahem.
msg119793 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-28 14:55
Let's try with the patch. r85893.

(I'm afraid the pre-commit hook still rejected the file after applying the patch.)
msg121135 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-13 13:17
There are still sporadic failures such as:

======================================================================
ERROR: testURLread (test.test_urllibnet.URLTimeoutTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/pythonbuildbot/buildarea/2.7.hansen-osx-x86/build/Lib/test/test_urllibnet.py", line 40, in testURLread
    x = f.read()
  File "/Users/pythonbuildbot/buildarea/2.7.hansen-osx-x86/build/Lib/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
timeout: timed out

which seems to show that not everything has been properly converted.

(http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%
202.7/builds/156/steps/test/logs/stdio )

Le samedi 13 novembre 2010 à 12:34 +0000, Senthil Kumaran a écrit :
> Changes by Senthil Kumaran <orsenthil@gmail.com>:
> 
> 
> ----------
> status: pending -> closed
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue10116>
> _______________________________________
msg124006 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-15 04:04
On Sat, Nov 13, 2010 at 01:17:47PM +0000, Antoine Pitrou wrote:
> There are still sporadic failures such as:

I have made a small change for this in r87260.
msg125870 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-09 22:41
There are still sporadic failures:

http://www.python.org/dev/buildbot/all/builders/AMD64%20Leopard%203.x/builds/684/steps/test/logs/stdio

Of course, it seems to be because of stupid urllib wrapping the socket.error inside an IOError:

Traceback (most recent call last):
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/test_urllibnet.py", line 97, in test_getcode
    open_url = urllib.request.FancyURLopener().open(URL)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/urllib/request.py", line 1516, in open
    raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/urllib/request.py", line 1512, in open
    return getattr(self, name)(url)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/urllib/request.py", line 1684, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/urllib/request.py", line 1664, in _open_generic_http
    http_conn.request("GET", selector, headers=headers)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 962, in request
    self._send_request(method, url, body, headers)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 1000, in _send_request
    self.endheaders(body)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 958, in endheaders
    self._send_output(message_body)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 803, in _send_output
    self.send(msg)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 741, in send
    self.connect()
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/http/client.py", line 719, in connect
    self.timeout, self.source_address)
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/socket.py", line 398, in create_connection
    raise err
  File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/socket.py", line 389, in create_connection
    sock.connect(sa)
IOError: [Errno socket error] [Errno 60] Operation timed out
msg221665 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-26 23:39
Is it safe to assume that these failures have been resolved?
msg268958 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-21 03:05
It looks like Antoine’s second failure (test_getcode) should be fixed now that transient_internet() is used (Issue 12138).

Looking at Antoine’s first failure, it will always be possible for it to fail depending on the conditions of the buildbot, internet, and the www.example.com server:

def setUp(self):
    socket.setdefaulttimeout(self.TIMEOUT)

def testURLread(self):
    f = _open_with_retry(urllib.urlopen, "http://www.example.com/")
    x = f.read()  # Timed out!

I would suggest changing this to connect to a server running on localhost in a background thread, and moving it into test_urllib.py because it would no longer need the “network” resource. That should eliminate enough causes of a timeout, the remaining one being a very slow buildbot, which I don’t think is worth worrying about.

Also see revision aa46bdbe641d in Python 3, where the read() call was shifted under the transient_internet() handler, and seems to defeat the purpose of the test.
History
Date User Action Args
2016-06-21 13:44:38BreamoreBoysetnosy: - BreamoreBoy
2016-06-21 03:05:51martin.pantersetnosy: + martin.panter

messages: + msg268958
versions: + Python 2.7
2014-10-25 03:02:54berker.peksagsetnosy: + berker.peksag
2014-06-26 23:39:06BreamoreBoysetnosy: + BreamoreBoy

messages: + msg221665
versions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2
2011-01-09 22:41:13pitrousetstatus: closed -> open
nosy: georg.brandl, ixokai, orsenthil, pitrou
messages: + msg125870

resolution: fixed ->
stage: patch review ->
2010-12-15 04:04:26orsenthilsetnosy: georg.brandl, ixokai, orsenthil, pitrou
messages: + msg124006
2010-11-13 13:17:45pitrousetmessages: + msg121135
2010-11-13 12:34:59orsenthilsetstatus: pending -> closed
2010-10-28 14:55:29georg.brandlsetstatus: open -> pending

nosy: + georg.brandl
messages: + msg119793

resolution: fixed
2010-10-28 14:18:13pitrousetstage: resolved -> patch review
2010-10-28 14:11:07ixokaisetfiles: - issue10116.patch
2010-10-28 14:11:00ixokaisetfiles: + issue10116-nowhitespace.patch

messages: + msg119786
2010-10-28 01:52:50ixokaisetfiles: + issue10116.patch
keywords: + patch
messages: + msg119751
2010-10-17 14:58:03ixokaisetresolution: fixed -> (no value)
messages: + msg118944
2010-10-17 13:10:11ixokaisetmessages: + msg118940
2010-10-17 10:56:45orsenthilsetresolution: fixed
messages: + msg118934

assignee: orsenthil
type: behavior
stage: resolved
2010-10-16 03:52:18orsenthilsetmessages: + msg118854
2010-10-15 13:51:25pitrousetnosy: + pitrou, orsenthil
messages: + msg118774
2010-10-15 13:44:39ixokaicreate