This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: (Fancy) URL opener stuck when trying to open redirected url
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: urllib.request.urlretrieve hangs waiting for connection close after a redirect
View: 8035
Assigned To: orsenthil Nosy List: SilentGhost, neologix, orsenthil, pitrou, xhresko
Priority: normal Keywords:

Created on 2010-11-29 13:59 by xhresko, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg122799 - (view) Author: (xhresko) Date: 2010-11-29 13:59
(Fancy) URL opener stucks whet try to open page, which is automaticaly forwarded. I tried url "http://www.ihned.cz", which stuck while "http://ihned.cz" is ok. This type of behavior is different from one in the Python 2.7, which works ok. 

///// CODE
opener = urllib.FancyURLopener({})
f = opener.open("http://www.ihned.cz/")
/////
msg122804 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2010-11-29 14:17
@xhresko: This is not valid py3k code.

It is 302 redirect. I get the following error:

IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
msg122805 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2010-11-29 14:22
@xhresko: why are you passing empty dict to the constructor? it works just fine with opener = urllib.request.FancyURLopener() 

resolution: invalid ?
msg122831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-29 17:51
I can reproduce the issue with 3.2 here. Using Wireshark, I see that the request to http://www.ihnez.cz is satisfied, but the second request (to http://ihnez.cz) is never issued. Here is the Wireshark dump for the TCP session (request, then response):

"""GET / HTTP/1.1

Accept-Encoding: identity

Host: www.ihned.cz

User-Agent: Python-urllib/3.2



HTTP/1.1 302 Found

Server: nginx

Date: Mon, 29 Nov 2010 17:41:23 GMT

Content-Type: text/html; charset=WINDOWS-1250

Transfer-Encoding: chunked

Connection: keep-alive

Location: http://ihned.cz/



0

"""



Looking at the traceback when pressing Control-C, it seems the redirect handler in urllib expects the socket to be closed by the server, but it isn't; so it keeps waiting for more data (despite the "0" signifying the end of the chunked response):

>>> import urllib.request
>>> opener = urllib.request.FancyURLopener()
>>> f = opener.open("http://www.ihned.cz/")
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1504, in open
    return getattr(self, name)(url)
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1676, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1672, in _open_generic_http
    response.status, response.reason, response.msg, data)
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1688, in http_error
    result = method(url, fp, errcode, errmsg, headers)
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1876, in http_error_302
    data)
  File "/home/antoine/py3k/__svn__/Lib/urllib/request.py", line 1887, in redirect_internal
    void = fp.read()
  File "/home/antoine/py3k/__svn__/Lib/socket.py", line 267, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt


However, urllib.request.urlopen() works fine in this case, so perhaps this advocates for deprecating the old stuff? Senthil?
msg122836 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-29 18:04
That line of code (`void = fp.read()`) dates back to a commit by Guido in 1995, and isn't motivated by any comment or message. HTTP servers probably have evolved till then :)
msg123192 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-03 06:33
On Mon, Nov 29, 2010 at 05:51:35PM +0000, Antoine Pitrou wrote:

> However, urllib.request.urlopen() works fine in this case, so
> perhaps this advocates for deprecating the old stuff? Senthil?

Yes. It should be deprecated.. I created a branch for trying
removing/refactoring some old old (urlretrive, etc) for eg. Let me go
along that and come up with something soon.
msg126166 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-01-13 11:53
It's a dupe of http://bugs.python.org/issue8035.

By the way, it works with 2.7 because urllib used HTTP 1.0 by default, and in py3k it now uses HTTP 1.1.
And from what I understood (by I'm by no means an http expert), in http 1.0 the server was required to close the connection following a 302, and this requirement was lifted in http 1.1.
msg126179 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-13 16:42
Ok, closing as duplicate.
History
Date User Action Args
2022-04-11 14:57:09adminsetgithub: 54786
2011-01-13 16:42:39pitrousetstatus: open -> closed
superseder: urllib.request.urlretrieve hangs waiting for connection close after a redirect
messages: + msg126179

nosy: orsenthil, pitrou, SilentGhost, neologix, xhresko
resolution: duplicate
2011-01-13 11:53:33neologixsetnosy: + neologix
messages: + msg126166
2010-12-03 06:33:36orsenthilsetmessages: + msg123192
2010-11-29 18:04:36pitrousetmessages: + msg122836
2010-11-29 18:01:59SilentGhostsetversions: + Python 2.7
2010-11-29 18:01:46SilentGhostsetversions: - Python 2.7
2010-11-29 17:51:19pitrousetversions: + Python 2.7, Python 3.2
nosy: + orsenthil, pitrou

messages: + msg122831

assignee: orsenthil
2010-11-29 14:22:38SilentGhostsetmessages: + msg122805
2010-11-29 14:17:03SilentGhostsetnosy: + SilentGhost
messages: + msg122804
2010-11-29 14:08:32SilentGhostsettitle: (Fancy) URL opener stucks whet try to open page -> (Fancy) URL opener stuck when trying to open redirected url
2010-11-29 13:59:43xhreskocreate