Title: urllib2 hangs with some documents.
Components: Library (Lib) Versions: Python 2.5
Status: closed Resolution: out of date
Dependencies: Superseder: infinite loop in httplib
Assigned To: Nosy List: acreature, amaury.forgeotdarc, georg.brandl, jjlee, orsenthil
Created on 2007-08-12 01:22 by acreature, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit acreature, 2007-08-12 01:22 A test case that raises this issue.
Messages (6)
Author: Creature (acreature) Date: 2007-08-12 01:22
While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour. 

By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue. 

This test case is attached as a file, but is also available at . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring. 
Author: Creature (acreature) Date: 2007-08-12 01:32
It seems that a fix to this issue is to change line 525 to add "or line == ''" on in Python 2.4.4:

# read and discard trailer up to the CRLF terminator
### note: we shouldn't have any trailers!
    while True:
        line = self.fp.readline()
        if line == '\r\n' or line == '':

I'm told that this is found on line 574 on Python 2.5.
Author: Senthil Kumaran (orsenthil) Date: 2007-08-13 03:07
Yes, I could verify the issue as well as the fix.
Please submit a patch to patches or if someone with trunk access can make the change immediately.
Author: Georg Brandl (georg.brandl) Date: 2007-08-23 17:34
The fix seems safe to apply.
Author: John J Lee (jjlee) Date: 2008-12-02 19:50
Please close: this is already fixed on trunk and release25-maint
(r60747, issue #1966) (and on release26-maint, which was branched after
the fix).
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2008-12-02 20:12
Thanks for the review!
