Message 102351 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	neologix
Recipients	andyharrington, neologix
Date	2010-04-04.21:01:21
SpamBayes Score	6.858393e-05
Marked as misclassified	No
Message-id	<1270414885.23.0.714628903416.issue8035@psf.upfronthosting.co.za>
In-reply-to

Content
Alright, what happens is the following: - the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily - in urllib, when we get a redirection, we call redirect_internal: def redirect_internal(self, url, fp, errcode, errmsg, headers, data): if 'location' in headers: newurl = headers['location'] elif 'uri' in headers: newurl = headers['uri'] else: return void = fp.read() fp.close() # In case the server sent a relative URL, join with original: newurl = basejoin(self.type + ":" + url, newurl) return self.open(newurl) the fp.read() is there to wait for the remote end to close connection The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump). So, the process remains stuck on fp.read(). Now, in version 3.1, if we simply change Lib/http/client.py:628 from class HTTPConnection: _http_vsn = 11 _http_vsn_str = 'HTTP/1.1' to class HTTPConnection: _http_vsn = 11 _http_vsn_str = 'HTTP/1.0' to use HTTP/1.0 instead, the retrieval works fine. Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves. That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so.

Alright, what happens is the following:
- the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily
- in urllib, when we get a redirection, we call redirect_internal:
    def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
        if 'location' in headers:
            newurl = headers['location']
        elif 'uri' in headers:
            newurl = headers['uri']
        else:
            return
        void = fp.read()
        fp.close()
        # In case the server sent a relative URL, join with original:
        newurl = basejoin(self.type + ":" + url, newurl)
        return self.open(newurl)

the fp.read() is there to wait for the remote end to close connection
The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump).
So, the process remains stuck on fp.read().
Now, in version 3.1, if we simply change Lib/http/client.py:628
from 
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.1'

to
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.0'

to use HTTP/1.0 instead, the retrieval works fine.

Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves.

That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so.

History
Date	User	Action	Args
2010-04-04 21:01:25	neologix	set	recipients: + neologix, andyharrington
2010-04-04 21:01:25	neologix	set	messageid: <1270414885.23.0.714628903416.issue8035@psf.upfronthosting.co.za>
2010-04-04 21:01:23	neologix	link	issue8035 messages
2010-04-04 21:01:22	neologix	create