classification
Title: urllib.request.urlretrieve hangs waiting for connection close after a redirect
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, andyharrington, eric.araujo, neologix, orsenthil, pitrou, python-dev, r.david.murray
Priority: normal Keywords: needs review, patch

Created on 2010-03-01 19:01 by andyharrington, last changed 2011-12-30 21:30 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
tests.zip andyharrington, 2010-03-01 19:01 2.6 file that works and 3.1 file that hangs on urlretrieve in some cases
urllib_redirect.diff neologix, 2011-05-20 16:18 review
Messages (13)
msg100259 - (view) Author: Andy Harrington (andyharrington) Date: 2010-03-01 19:01
I give simple 2.6 code test26.py working with urllib.urlretrieve and corresponding 3.1 code test31.py with urllib.request.urlretrieve that hangs (close enough to your category 'crash'?) on Ubuntu 1.9.1 and Windows XP, service pack 2.

Both work on an html page in google sites, 
http://sites.google.com/a/luc.edu/comp150/filebin/
but only the 3.1 verson fails when retrieving a file in a Google filebin.  
http://sites.google.com/a/luc.edu/comp150/filebin/addition1.py
The file in the filebin downloads simply in the Firefox browser GUI and in the corresponding 2.6 program.
msg102351 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-04 21:01
Alright, what happens is the following:
- the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily
- in urllib, when we get a redirection, we call redirect_internal:
    def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
        if 'location' in headers:
            newurl = headers['location']
        elif 'uri' in headers:
            newurl = headers['uri']
        else:
            return
        void = fp.read()
        fp.close()
        # In case the server sent a relative URL, join with original:
        newurl = basejoin(self.type + ":" + url, newurl)
        return self.open(newurl)

the fp.read() is there to wait for the remote end to close connection
The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump).
So, the process remains stuck on fp.read().
Now, in version 3.1, if we simply change Lib/http/client.py:628
from 
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.1'

to
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.0'

to use HTTP/1.0 instead, the retrieval works fine.

Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves.

That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so.
msg102354 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-04 22:31
@andyharrington: No, crash is when the interpreter segfaults.
I'm making it priority high, though, since it is a hang during an operation that is likely to happen fairly frequently.  Senthil may want to bump it up even higher.

@neologix: Thanks for figuring this out.
msg111093 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-21 17:24
This URL does not seem to return a 302 code. Is there another example?
msg111108 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-07-21 19:55
2010/7/21 Amaury Forgeot d'Arc <report@bugs.python.org>
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> This URL does not seem to return a 302 code. Is there another example?

There are probably many of them, but a simple way to reproduce it
could be to use netcat on your machine and have it send back:

HTTP/1.1 302 Found
Location: http://www.wikipedia.org/index.php

This should do the trick.

>
> ----------
> nosy: +amaury.forgeotdarc
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue8035>
> _______________________________________
msg111129 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-21 22:23
I would have hoped a Windows way...
msg126180 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 16:45
issue10577 is a duplicate. See an URL allowing reproducing in msg122831.
msg136386 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-20 16:18
Those URLs don't trigger the problem anymore, but AFAICT from the code, this problem is still present in py3k.
Here's an updated patch.
msg149765 - (view) Author: Roundup Robot (python-dev) Date: 2011-12-18 15:09
New changeset 038616802b65 by Charles-François Natali in branch '2.7':
Issue #8035: urllib: Fix a bug where the client could remain stuck after a
http://hg.python.org/cpython/rev/038616802b65

New changeset a420b27a86d9 by Charles-François Natali in branch '3.2':
Issue #8035: urllib: Fix a bug where the client could remain stuck after a
http://hg.python.org/cpython/rev/a420b27a86d9

New changeset 764234983120 by Charles-François Natali in branch 'default':
Issue #8035: urllib: Fix a bug where the client could remain stuck after a
http://hg.python.org/cpython/rev/764234983120
msg149774 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-18 16:12
Alright, should be fixed now.
msg150016 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-12-21 17:01
I think this would need a test.
msg150022 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-21 17:28
Yes, but it's not easy: the different URLs provided don't demonstrate
the behavior anymore (even if we do find such an URL, there's no
guarantee it won't change in a couple days/weeks).
It could be possible to set up an ad-hoc httpserver for that purpose,
but that sounds a bit overkill.
msg150396 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-12-30 21:30
> It could be possible to set up an ad-hoc httpserver for that purpose,
> but that sounds a bit overkill.

Oh, I assumed urllib already had a server for tests.  We have one in distutils2/packaging, otherwise we’d have to resort to manual testing against the real PyPI, and I’d feel much less confident about regressions.
History
Date User Action Args
2011-12-30 21:30:21eric.araujosetmessages: + msg150396
versions: - Python 3.1
2011-12-21 17:28:58neologixsetmessages: + msg150022
2011-12-21 17:01:14eric.araujosetnosy: + eric.araujo
messages: + msg150016
2011-12-18 16:12:02neologixsetstatus: open -> closed
resolution: fixed
messages: + msg149774

stage: patch review -> resolved
2011-12-18 15:09:41python-devsetnosy: + python-dev
messages: + msg149765
2011-07-03 09:45:38neologixsetkeywords: + needs review
stage: needs patch -> patch review
2011-05-20 16:18:36neologixsetfiles: + urllib_redirect.diff
nosy: amaury.forgeotdarc, orsenthil, pitrou, andyharrington, r.david.murray, neologix
messages: + msg136386
2011-05-20 16:15:45neologixsetfiles: - urllib_redirect.diff
2011-01-13 16:45:27pitrousetpriority: high -> normal

nosy: + pitrou
title: urllib.request.urlretrieve hangs -> urllib.request.urlretrieve hangs waiting for connection close after a redirect
messages: + msg126180

stage: test needed -> needs patch
2011-01-13 16:42:39pitroulinkissue10577 superseder
2010-07-21 22:23:41amaury.forgeotdarcsetmessages: + msg111129
2010-07-21 19:55:54neologixsetmessages: + msg111108
2010-07-21 17:24:18amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg111093
2010-04-04 22:31:02r.david.murraysetpriority: high

type: crash -> behavior
versions: + Python 3.2
nosy: + r.david.murray, orsenthil

messages: + msg102354
stage: test needed
2010-04-04 21:01:23neologixsetfiles: + urllib_redirect.diff

nosy: + neologix
messages: + msg102351

keywords: + patch
2010-03-01 19:01:52andyharringtoncreate