classification
Title: test_urllib2 fails - urlopen error file not on local host
Type: Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: csernazs, dmorr, ned.deily, orsenthil
Priority: normal Keywords: patch

Created on 2009-03-31 15:48 by ned.deily, last changed 2010-12-16 10:48 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
patch-nad0017-trunk-26.txt ned.deily, 2009-03-31 15:48
patch-nad0017-py3k-30.txt ned.deily, 2009-03-31 15:48
test_urllib2.py.diff csernazs, 2010-12-15 13:34
unnamed orsenthil, 2010-12-15 13:40
Messages (10)
msg84806 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-03-31 15:48
[NOTE: applies to 2.x urllib2 and similar code in merged 3.x urllib]

test_urllib2 can fail because urllib2.FileHandler assumes incorrectly
that the local host has only a single IP address.  It is not uncommon
to have host IP configurations where a host has more than one network
interface and the same IP host name is associated with each address.

Both the urllib module and test_urllib2 use
    socket.gethostbyname(socket.gethostname())
to find "the" host IP address.  But, as can be seen here, 
consecutive calls may produce different addresses depending on the
network configuration and underlying os implementation:

Python 2.6.1 (r261:67515, Dec 17 2008, 23:27:50) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname(socket.gethostname())
'10.52.12.105'
>>> socket.gethostbyname(socket.gethostname())
'10.52.12.105'
>>> socket.gethostbyname(socket.gethostname())
'10.52.12.205'
>>>

This leads to predictable test failures when the calls in test_urllib2
and urllib2.FileHandler return different addresses:

test_urllib2
test test_urllib2 failed -- Traceback (most recent call last):
  File 
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/test/te
st_urllib2.py", line 621, in test_file
    r = h.file_open(Request(url))
  File 
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2
.py", line 1229, in file_open
    return self.open_local_file(req)
  File 
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2
.py", line 1266, in open_local_file
    raise URLError('file not on local host')
URLError: <urlopen error file not on local host>

The simplest way to avoid the test failure is to modify
urllib2.FileHandler to use socket.gethostbyname_ex which returns all
of the IPv4 addresses associated with a hostname:
>>> socket.gethostbyname_ex(socket.gethostname())
('myhost.net', [], ['10.52.12.205', '10.52.12.105'])

Attached patches for 2.x urllib2 and 3.x urllib do that.  Note that 
there remain other issues in this area:
- when urllib2 is enhanced to support IPv6, code is needed to return
  all of the host's IPv6 addresses as well (-> adding a note to open
  Issue1675455)
- the merged 3.0 urlib has two nearly identical functions named
  open_local_file, one each from 2.x urllib.URLopener and
  urllib2.FileHandler, and both use similarly flawed
  socket.gethostbyname(socket.gethostname()) tests but the tests for
  local vs remote file URLs is somewhat different in each.
  (The patches here do not attempt to address this other than to add
   a comment.)
msg94163 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009-10-17 04:22
While you're poking around in urllib2, perhaps I can interest you in 
looking at these patches.
msg96900 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-12-27 09:12
Thanks for the patch, Ned. Fixed in the trunk revision 77058.
msg96901 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-12-27 10:17
Merged the fixes in r77059, r77060 and r77061
I fixed the thishost function to return all ips in py3k.
msg124008 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-12-15 09:09
Could you please add this change to test_urllib2.py as well?

It has the following line:
            localaddr = socket.gethostbyname(socket.gethostname())

But urllib2.py has the change related to this bug.
That makes test_urllib2 failing when gethostbyname reports different IP than gethostbyname_ex:

(Pdb) socket.gethostbyname_ex(socket.gethostname())[2]
['172.31.92.26']
(Pdb) socket.gethostbyname(socket.gethostname())
'172.31.72.206'
msg124014 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-15 10:54
Zsolt,

The change in the urllib2 was at a place where tuple of all local ips
were required.
In test_urllib2, which testcase failed?
Also, can you make this change and see if this helps in your case.

-             localaddr = socket.gethostbyname(socket.gethostname())
+             localaddr = socket.gethostbyname('localhost')

If this is sufficient, this change can be made in the trunk.
msg124016 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-12-15 13:34
The test which failed was HandlerTests.test_file, and I'm using python 2.7.1.

socket.gethostbyname('localhost') returns "127.0.0.1" which is ok, but in the unittest it's already tested (line 671).

The problem is that my /etc/hosts file contains a different IP than the DNS (I cannot change this behaviour as I'm not the administrator of the host) and that's the difference between gethostbyname and gethostbyname_ex.

The unittest creates an url which is not local (from urllib2 point of view). I'm attaching a patch which has fixed my problem.
msg124017 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-15 13:40
+            localaddr = socket.gethostbyname_ex(socket.gethostname())[2][0]

May not be a generic solution, because in another system the other ip
could be first in the list.  Because the failure was in the test_file,
which was basically exercising file://'localhost' in the url, I
suggested that you replace with 'localhost'. I think, the solution is
okay, even thought localhost has been exercised in another test.
msg124019 - (view) Author: Zsolt Cserna (csernazs) * Date: 2010-12-15 14:01
The order of the IP addresses doesn't matter as urllib2 is flexible enough to handle all local IP addresses as local (that was the original bug - it handled only one IP returned by gethostbyname which returned a random IP if there were more than one).

So picking up the first IP is ok I think as the order of the IP addresses doesn't matter - urllib2 will handle all of them as local.
See urllib2.FileHandler.get_names().

The problem is that gethostbyname doesn't guarantee that it returns one IP address from the set returned by gethostbyname_ex as gethostbyname looks up the name in /etc/hosts file first (or as configured in NSS).
msg124123 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-16 10:48
Well, ignore my comment on order of ip addresses. It definitely does not matter in this case for test_urllib2.

However, readability does matter again as per my previous explanation, since http://localhost/ was being exercised in the test_file, gethostbyname('localhost') is much better than that return value's [2][0] element.

I overlooked one thing in your first message, namely gethostbyname and gethostbyname_ex()[2] returning completely different ips and turning out to be exclusive. This should not be the case. gethostbyname_ex()[2] should include the ip which was returned by gethostbyname. If it were the case, the test would not have failed as well.

And btw, both these are supposed have similar behavior (The default action is to query named(8), followed by /etc/hosts) only thing is gethostbyname_ex uses the reentrant c function call and is thread-safe.

(You may probably want to identify the problem for the difference in o/p there)

And for this bug report, I am still inclined to having 'localhost' for readability purposes or leaving it as such because the problem seems be elsewhere.
History
Date User Action Args
2010-12-16 10:48:17orsenthilsetnosy: csernazs, orsenthil, dmorr, ned.deily
messages: + msg124123
2010-12-15 14:01:52csernazssetnosy: csernazs, orsenthil, dmorr, ned.deily
messages: + msg124019
2010-12-15 13:40:25orsenthilsetfiles: + unnamed

messages: + msg124017
nosy: csernazs, orsenthil, dmorr, ned.deily
2010-12-15 13:34:27csernazssetfiles: + test_urllib2.py.diff

messages: + msg124016
keywords: + patch
nosy: csernazs, orsenthil, dmorr, ned.deily
2010-12-15 10:54:26orsenthilsetnosy: csernazs, orsenthil, dmorr, ned.deily
messages: + msg124014
2010-12-15 09:09:16csernazssetnosy: + csernazs
messages: + msg124008
2009-12-27 10:17:25orsenthilsetstatus: open -> closed

messages: + msg96901
2009-12-27 09:12:00orsenthilsetresolution: fixed
messages: + msg96900
2009-10-18 02:07:12orsenthilsetassignee: orsenthil
2009-10-17 04:22:54ned.deilysetnosy: + orsenthil

messages: + msg94163
versions: + Python 3.2, - Python 3.0
2009-05-15 15:25:53dmorrsetnosy: + dmorr
2009-03-31 15:48:54ned.deilysetfiles: + patch-nad0017-py3k-30.txt
2009-03-31 15:48:07ned.deilycreate