This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author guest
Recipients guest
Date 2010-08-03.22:20:44
SpamBayes Score 1.7624587e-05
Marked as misclassified No
Message-id <1280874046.96.0.44632804656.issue9500@psf.upfronthosting.co.za>
In-reply-to
Content
urllib2 doesn't support any real-world Content-Encoding scheme.

"gzip" and "deflate" are standard compression schemes for HTTP and expected to be implemented by all clients. None of the default urllib2 handlers implements it.

Common workarounds are available on the Google. Many people resort to fixing up HTTP responses within their application logic (=not good) due to lack of library support. And some wrote proper urllib2 handlers. Here's one for gzip support with deflate/zlib (HTTP spec is unclear on zlib vs. raw deflate format, hence some buggy servers) hacked on:


# http://techknack.net/python-urllib2-handlers/    
from gzip import GzipFile
from StringIO import StringIO
class ContentEncodingProcessor(urllib2.BaseHandler):
  """A handler to add gzip capabilities to urllib2 requests """

  # add headers to requests   
  def http_request(self, req):
    req.add_header("Accept-Encoding", "gzip, deflate")
    return req

  # decode
  def http_response(self, req, resp):
    old_resp = resp
    # gzip
    if resp.headers.get("content-encoding") == "gzip":
        gz = GzipFile(
                    fileobj=StringIO(resp.read()),
                    mode="r"
                  )
        resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
        resp.msg = old_resp.msg
    # deflate
    if resp.headers.get("content-encoding") == "deflate":
        gz = StringIO( deflate(resp.read()) )
        resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)  # 'class to add info() and
        resp.msg = old_resp.msg
    return resp

# deflate support
import zlib
def deflate(data):   # zlib only provides the zlib compress format, not the deflate format;
  try:               # so on top of all there's this workaround:
    return zlib.decompress(data, -zlib.MAX_WBITS)
  except zlib.error:
    return zlib.decompress(data)
History
Date User Action Args
2010-08-03 22:20:47guestsetrecipients: + guest
2010-08-03 22:20:46guestsetmessageid: <1280874046.96.0.44632804656.issue9500@psf.upfronthosting.co.za>
2010-08-03 22:20:45guestlinkissue9500 messages
2010-08-03 22:20:44guestcreate