classification
Title: urllib2 fails to retrieve a url which is handled correctly by urllib
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Albert.Weichselbraun, amaury.forgeotdarc, flox, orsenthil
Priority: normal Keywords:

Created on 2010-08-21 10:27 by Albert.Weichselbraun, last changed 2010-08-21 15:33 by orsenthil. This issue is now closed.

Messages (5)
msg114482 - (view) Author: Albert Weichselbraun (Albert.Weichselbraun) Date: 2010-08-21 10:27
urllib2 fails to retrieve the content of http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all

>>> urllib2.urlopen("http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all").read()
''

urllib handles the same link correctly:

>>> len( urllib.urlopen("http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all").read() )
56105
msg114483 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-21 10:53
Its funny, confirmed the problem in the trunk.
msg114486 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-08-21 11:56
Hmm, it looks like a web server problem to me.

urllib2 uses the HTTP/1.1 protocol, and sends the "Connection: close" header. I hacked urllib2: when this header is not sent, the content is retrieved normally.

This page: 
http://www.mail-archive.com/users@tomcat.apache.org/msg28684.html
describes the same problem.
The web site above does use Tomcat (can be seen in the response headers), maybe they have a wrong version?
msg114487 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-08-21 12:05
Confirmed with telnet sessions:

== Simulate "urllib2" ==

$ telnet www.mfsa.com.mt 80
GET /insguide/english/glossarysearch.jsp?letter=all HTTP/1.1
Accept-Encoding: identity
Host: www.mfsa.com.mt
Connection: close
User-Agent: Python-urllib/2.7

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=D34D395A7654B6532F6F6DFF81FC91C3; Path=/insguide
Content-Type: text/html
Date: Sat, 21 Aug 2010 11:54:25 GMT
Connection: close

Connection closed by foreign host.
$ 

== Simulate "urllib" ==

GET /insguide/english/glossarysearch.jsp?letter=all HTTP/1.0
Host: www.mfsa.com.mt
User-Agent: Python-urllib/1.17

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=84D9D8DF76546751908F388D8889BB47; Path=/insguide
Content-Type: text/html
Transfer-Encoding: chunked
Date: Sat, 21 Aug 2010 11:54:06 GMT

400
<html>...
$
msg114502 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-21 15:33
Thanks Amaury, that was nice debugging.

The problem is with Apache tomcat server at the remote end, which is  misbehaving on Connection:close header being sent by urllib2. We can't do anything about it, the bug reporter can take it up with server.

However, in the Urllib2 documentation, if needed, it can be mentioned that urllib2 is sending Connection:close while using HTTP/1.1 whereas urllib uses HTTP/1.0.

Closing this bug as Invalid.
History
Date User Action Args
2010-08-21 15:33:18orsenthilsetstatus: open -> closed
resolution: accepted -> not a bug
messages: + msg114502

stage: needs patch -> resolved
2010-08-21 12:05:14floxsetnosy: + flox

messages: + msg114487
versions: + Python 2.7
2010-08-21 11:56:36amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg114486
2010-08-21 10:53:40orsenthilsetnosy: + orsenthil
messages: + msg114483

assignee: orsenthil
resolution: accepted
stage: needs patch
2010-08-21 10:27:32Albert.Weichselbrauncreate