Message32622
While working on a web spider I encountered the following page that causes the read() call of a urllib2 response to fail. It uses 100% of the CPU and does not seem to ever return. I have this behaviour on Python 2.4.4, but several people on 2.5.1 have tried the code below and reported the same behaviour.
By the way, the page it uses is a porn site, but please don't get hung up on that fact. This is a data processing issue, not a subject matter issue.
This test case is attached as a file, but is also available at http://pastebin.com/d6f98618f . Please note that the user-agent masquerading is present to rule out any issues with the server returning different data to different clients; commenting out the line so Python sends the standard headers still results in the issue occuring. |
|
Date |
User |
Action |
Args |
2007-08-23 14:59:10 | admin | link | issue1772481 messages |
2007-08-23 14:59:10 | admin | create | |
|