Message 32622 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	acreature
Recipients
Date	2007-08-12.01:22:42
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour. By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue. This test case is attached as a file, but is also available at http://pastebin.com/d6f98618f . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring.

While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour. 

By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue. 

This test case is attached as a file, but is also available at http://pastebin.com/d6f98618f . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring.

History
Date	User	Action	Args
2007-08-23 14:59:10	admin	link	issue1772481 messages
2007-08-23 14:59:10	admin	create