Message102351
Alright, what happens is the following:
- the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily
- in urllib, when we get a redirection, we call redirect_internal:
def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
if 'location' in headers:
newurl = headers['location']
elif 'uri' in headers:
newurl = headers['uri']
else:
return
void = fp.read()
fp.close()
# In case the server sent a relative URL, join with original:
newurl = basejoin(self.type + ":" + url, newurl)
return self.open(newurl)
the fp.read() is there to wait for the remote end to close connection
The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump).
So, the process remains stuck on fp.read().
Now, in version 3.1, if we simply change Lib/http/client.py:628
from
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.1'
to
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.0'
to use HTTP/1.0 instead, the retrieval works fine.
Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves.
That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so. |
|
Date |
User |
Action |
Args |
2010-04-04 21:01:25 | neologix | set | recipients:
+ neologix, andyharrington |
2010-04-04 21:01:25 | neologix | set | messageid: <1270414885.23.0.714628903416.issue8035@psf.upfronthosting.co.za> |
2010-04-04 21:01:23 | neologix | link | issue8035 messages |
2010-04-04 21:01:22 | neologix | create | |
|