Message 49363 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jjlee
Recipients
Date	2006-01-21.22:10:49
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=261020 In fact the commit message for rev 36871 states the real reason _fileobject is used (handling chunked encoding), showing my workaround is even more harmful than I thought. Moreover, doing a urlopen on 66.117.37.13 shows the response is chunked. The problem seems to be caused by httplib failing to find a CRLF at the end of the chunked response, so the loop at the end of _read_chunked() never terminates. Haven't looked in detail yet, but I'm guessing a) it's the server's fault and b) httplib should work around it. Here's the commit message from 36871: Fix urllib2.urlopen() handling of chunked content encoding. The change to use the newer httplib interface admitted the possibility that we'd get an HTTP/1.1 chunked response, but the code didn't handle it correctly. The raw socket object can't be pass to addinfourl(), because it would read the undecoded response. Instead, addinfourl() must call HTTPResponse.read(), which will handle the decoding. One extra wrinkle is that the HTTPReponse object can't be passed to addinfourl() either, because it doesn't implement readline() or readlines(). As a quick hack, use socket._fileobject(), which implements those methods on top of a read buffer. (suggested by mwh) Finally, add some tests based on test_urllibnet. Thanks to Andrew Sawyers for originally reporting the chunked problem.

Logged In: YES 
user_id=261020

In fact the commit message for rev 36871 states the real
reason _fileobject is used (handling chunked encoding),
showing my workaround is even more harmful than I thought. 
Moreover, doing a urlopen on 66.117.37.13 shows the response
*is* chunked.

The problem seems to be caused by httplib failing to find a
CRLF at the end of the chunked response, so the loop at the
end of _read_chunked() never terminates.  Haven't looked in
detail yet, but I'm guessing a) it's the server's fault and
b) httplib should work around it.


Here's the commit message from 36871:


Fix urllib2.urlopen() handling of chunked content encoding.

The change to use the newer httplib interface admitted the
possibility
that we'd get an HTTP/1.1 chunked response, but the code
didn't handle
it correctly.  The raw socket object can't be pass to
addinfourl(),
because it would read the undecoded response.  Instead,
addinfourl()
must call HTTPResponse.read(), which will handle the decoding.

One extra wrinkle is that the HTTPReponse object can't be
passed to
addinfourl() either, because it doesn't implement readline() or
readlines().  As a quick hack, use socket._fileobject(), which
implements those methods on top of a read buffer. 
(suggested by mwh)

Finally, add some tests based on test_urllibnet.

Thanks to Andrew Sawyers for originally reporting the
chunked problem.

History
Date	User	Action	Args
2007-08-23 15:45:26	admin	link	issue1411097 messages
2007-08-23 15:45:26	admin	create