Author vstinner
Recipients ned.deily, vstinner
Date 2014-02-21.23:19:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1393024781.13.0.142903317619.issue20719@psf.upfronthosting.co.za>
In-reply-to
Content
> It looks like the new python.org web server configuration was just changed to no longer gzip robots.txt so the test is no longer failing for me.

If I check HTTP headers of http://www.python.org/robots.txt using a small Python script sending "GET /robots.txt HTTP/1.0" and "Host: www.python.org" (but no Accept-Encoding header): I still see "Content-Encoding: gzip".

It looks like a bug in the HTTP server serving www.python.org, because my client didn't send "Accept-Encoding: gzip, deflate".

The RFC 2616 (HTTP/1.1) says "If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding."
http://www.w3.org/Protocols/rfc2616/rfc2616.html

See also:

"HTTP/1.1 (unlike HTTP/1.0) carefully specifies the Accept-Encoding header, used by a client to indicate what content-codings it can handle, and which ones it prefers."
http://www8.org/w8-papers/5c-protocols/key/key.html

The best solution would be to implement #1508475: support gzip in urllib.
History
Date User Action Args
2014-02-21 23:19:41vstinnersetrecipients: + vstinner, ned.deily
2014-02-21 23:19:41vstinnersetmessageid: <1393024781.13.0.142903317619.issue20719@psf.upfronthosting.co.za>
2014-02-21 23:19:41vstinnerlinkissue20719 messages
2014-02-21 23:19:40vstinnercreate