This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2: Content-Encoding
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: duplicate
Dependencies: Superseder: transparent gzip compression in urllib
View: 1508475
Assigned To: Nosy List: dstanek, guest, orsenthil, r.david.murray
Priority: normal Keywords: easy

Created on 2010-08-03 22:20 by guest, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg112707 - (view) Author: guest (guest) Date: 2010-08-03 22:20
urllib2 doesn't support any real-world Content-Encoding scheme.

"gzip" and "deflate" are standard compression schemes for HTTP and expected to be implemented by all clients. None of the default urllib2 handlers implements it.

Common workarounds are available on the Google. Many people resort to fixing up HTTP responses within their application logic (=not good) due to lack of library support. And some wrote proper urllib2 handlers. Here's one for gzip support with deflate/zlib (HTTP spec is unclear on zlib vs. raw deflate format, hence some buggy servers) hacked on:


# http://techknack.net/python-urllib2-handlers/    
from gzip import GzipFile
from StringIO import StringIO
class ContentEncodingProcessor(urllib2.BaseHandler):
  """A handler to add gzip capabilities to urllib2 requests """

  # add headers to requests   
  def http_request(self, req):
    req.add_header("Accept-Encoding", "gzip, deflate")
    return req

  # decode
  def http_response(self, req, resp):
    old_resp = resp
    # gzip
    if resp.headers.get("content-encoding") == "gzip":
        gz = GzipFile(
                    fileobj=StringIO(resp.read()),
                    mode="r"
                  )
        resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
        resp.msg = old_resp.msg
    # deflate
    if resp.headers.get("content-encoding") == "deflate":
        gz = StringIO( deflate(resp.read()) )
        resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)  # 'class to add info() and
        resp.msg = old_resp.msg
    return resp

# deflate support
import zlib
def deflate(data):   # zlib only provides the zlib compress format, not the deflate format;
  try:               # so on top of all there's this workaround:
    return zlib.decompress(data, -zlib.MAX_WBITS)
  except zlib.error:
    return zlib.decompress(data)
msg112744 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-04 03:26
Thanks for the suggestion.

New features can only go into Python3, where the urllib/urllib2 have been harmonized into the urllib package.  So what we would need in order to consider this for acceptance is a patch against py3k trunk urllib.  Please see http://python.org/dev for information about how to develop a patch for submission.
msg112796 - (view) Author: guest (guest) Date: 2010-08-04 12:55
Nah sorry, I've just been bothered to report it. As I don't run py3 can't write a patch anyway. And it wouldn't help for my current python 2.x setups also.
I guess it's sufficient if this is googleable, and per-application workarounds are very much ok, as Python2 isn't that widely used for webapps.

Also, httplib2 supports Content-Encoding. They still have that raw deflate vs. zlib bug, but that can be fixed. And as externally distributed lib will remedy the situation for all apps and Python < 2.8.

However, it might be a better idea to add a note to the urllib/2 documentation instead. "No default handler for Content-Encoding..." because many people stumbled on this before (see google/stackoverflow).
msg121754 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-20 20:17
Issue 1508475 has a patch, though it still needs updated.
History
Date User Action Args
2022-04-11 14:57:04adminsetgithub: 53709
2010-11-20 20:17:57r.david.murraysetstatus: open -> closed
resolution: duplicate
messages: + msg121754

superseder: transparent gzip compression in urllib
stage: test needed -> resolved
2010-08-04 12:55:59guestsetmessages: + msg112796
2010-08-04 03:26:19r.david.murraysettype: enhancement
versions: + Python 3.2, - Python 2.5
keywords: + easy
nosy: + r.david.murray, orsenthil

messages: + msg112744
stage: test needed
2010-08-03 22:32:12dstaneksetnosy: + dstanek
2010-08-03 22:20:45guestcreate