This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 hangs with some documents.
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: out of date
Dependencies: Superseder: infinite loop in httplib
View: 1966
Assigned To: Nosy List: acreature, amaury.forgeotdarc, georg.brandl, jjlee, orsenthil
Priority: normal Keywords:

Created on 2007-08-12 01:22 by acreature, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testcase.py acreature, 2007-08-12 01:22 A test case that raises this issue.
Messages (6)
msg32622 - (view) Author: Creature (acreature) Date: 2007-08-12 01:22
While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour. 

By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue. 

This test case is attached as a file, but is also available at http://pastebin.com/d6f98618f . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring. 
msg32623 - (view) Author: Creature (acreature) Date: 2007-08-12 01:32
It seems that a fix to this issue is to change line 525 to add "or line == ''" on httplib.py in Python 2.4.4:

# read and discard trailer up to the CRLF terminator
### note: we shouldn't have any trailers!
    while True:
        line = self.fp.readline()
        if line == '\r\n' or line == '':
            break

I'm told that this is found on line 574 on Python 2.5.
msg32624 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-08-13 03:07
Yes, I could verify the issue as well as the fix.
Please submit a patch to patches or if someone with trunk access can make the change immediately.
msg55166 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-08-23 17:34
The fix seems safe to apply.
msg76768 - (view) Author: John J Lee (jjlee) Date: 2008-12-02 19:50
Please close: this is already fixed on trunk and release25-maint
(r60747, issue #1966) (and on release26-maint, which was branched after
the fix).
msg76772 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-12-02 20:12
Thanks for the review!
History
Date User Action Args
2022-04-11 14:56:25adminsetgithub: 45298
2008-12-02 20:12:03amaury.forgeotdarcsetstatus: open -> closed
resolution: out of date
superseder: infinite loop in httplib
messages: + msg76772
nosy: + amaury.forgeotdarc
2008-12-02 19:50:23jjleesetnosy: + jjlee
messages: + msg76768
2007-08-23 17:34:23georg.brandlsetnosy: + georg.brandl
messages: + msg55166
2007-08-12 01:22:42acreaturecreate