classification
Title: transparent gzip compression in urllib
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: abacabadabacaba, antialize, eric.araujo, jcea, jcon, jerub, jjlee, nadeem.vawda, orsenthil, ruseel, serhiy.storchaka, thomaspinckney3
Priority: high Keywords:

Created on 2006-06-19 08:59 by antialize, last changed 2013-01-10 04:33 by jcon.

Files
File name Uploaded Description Edit
urllib2-gzip.patch antialize, 2006-06-19 09:26 urllib2-gzip.patch
issue1508475.diff orsenthil, 2010-11-25 08:49 review
Messages (15)
msg50500 - (view) Author: Jakob Truelsen (antialize) Date: 2006-06-19 08:59
Some webservers support gzipping things before sending
them, this patch adds transparrent support for this in
urllib2 (documentation http://www.http-compression.com/)

This patach *requires* hash patch 914340 as a
prerequirement as this enabels stream support in the
gzip libary.. 
msg50501 - (view) Author: John J Lee (jjlee) Date: 2007-01-30 01:34
Looks good.

This needs tests and docs.  As a new feature, this could not be released until Python 2.6.

It would be nice to have support for managing content negotiation in general, but that wish isn't an obstacle to this patch.
msg114671 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-08-22 10:53
@Jakob could you provide an updated patch for py3k that includes unit test and doc changes?
msg114725 - (view) Author: Jakob Truelsen (antialize) Date: 2010-08-23 07:43
No, I have long since moved on to other things.
msg114726 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-23 07:47
Its okay, Jacab, we will take it forward.
msg122342 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-11-25 08:43
The transparent gzip Content-Encoding support should be done at the
http.client level code.

Before adding this feature, a question needs to be sorted out.

If we support the transparent gzip and wrap the file pointer to a
GzipFile filepointer, should reset the Content-Length value?

What if a user of urllib is relying on the Content-Length of response
to do something further?

I observed that google-chrome returns the uncompressed output (which
is correct for a browser), but has the Content-Length set the
compressed output length.
msg122343 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-11-25 08:49
Patch for py3k.
msg122351 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-11-25 11:21
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13
msg158315 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-15 09:51
What if the gzip module is not available?

I think, with transparent decompression should delete headers Content-Encoding (to free the user from re-decompression) and Content-Length (which is wrong).
msg158380 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-15 23:52
In that case, transparent decompression should not be available. (
Request header should not be sent and response wont be compressed).
msg158400 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-16 05:41
The patch for py3k also has the disadvantage that the content is decoded even if the user has defined a Content-Encoding and he is going to process compressed response himself.
msg160355 - (view) Author: Tom Pinckney (thomaspinckney3) * Date: 2012-05-10 17:12
What if this gzip decompression was optional and controlled via a flag or handler instead of making it automagic?

It's not entirely trivial to implement so it is nice to have the option of this happening automatically if one wishes.

Then, the caller would be aware that Content-length / Accept-encoding / Content-encoding etc have been modified iff they requested gzip decompression.
msg160384 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-05-10 23:43
Enabled by default with a knob to turn it off sounds good.  Maybe the original headers could be preserved in some object.
msg163925 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-25 10:26
The first step is to answer on the fundamental question: on what level transparent decompression will work? On http.client level or on urllib level? Patch for first case will be much more difficult, but will benefit from compression in other http-based protocols.
msg163935 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-06-25 10:53
I think, the transparent compression should work at http.client level. I also agree with other points made by Serhiy:

- transparent decompression should delete headers Content-Encoding and Content-Length (this is as per RFC too)

- Should not do another compression if the user has a explicit specified intent of using Content-Encoding: gzip and is ready to do decompression himself.

- This transparent compression/decompression would require the availability gzip module, if not then the feature may be disabled and normal request-response cycle would proceed.

- I think, having it 'ON' with a flag to switch 'OFF' would be more desirable than having this feature via Handler. The reason being it can help in performance of any requests on servers that support it and browsers have adopted similar approach too.
History
Date User Action Args
2013-01-10 04:33:47jconsetnosy: + jcon
2012-07-07 16:27:14jerubsetnosy: + jerub
2012-06-25 10:53:45orsenthilsetmessages: + msg163935
2012-06-25 10:26:50serhiy.storchakasetmessages: + msg163925
2012-06-25 05:38:14rhettingersetpriority: normal -> high
2012-06-17 14:30:41jceasetnosy: + jcea
2012-06-17 10:32:19pitroulinkissue15089 superseder
2012-05-10 23:43:07eric.araujosetkeywords: - patch, easy

messages: + msg160384
2012-05-10 17:12:05thomaspinckney3setnosy: + thomaspinckney3
messages: + msg160355
2012-04-16 05:41:12serhiy.storchakasetmessages: + msg158400
2012-04-15 23:52:40orsenthilsetmessages: + msg158380
2012-04-15 09:51:31serhiy.storchakasetmessages: + msg158315
2012-04-15 07:07:02serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.3, - Python 3.2
2012-03-04 20:39:13abacabadabacabasetnosy: + abacabadabacaba
2010-11-25 11:21:10eric.araujosetmessages: + msg122351
2010-11-25 08:49:49orsenthilsetfiles: + issue1508475.diff

messages: + msg122343
stage: test needed -> patch review
2010-11-25 08:43:48orsenthilsetmessages: + msg122342
2010-11-23 21:26:28nadeem.vawdasetnosy: + nadeem.vawda
2010-11-23 04:39:56ruseelsetnosy: + ruseel
2010-11-20 20:23:25eric.araujosetnosy: + eric.araujo, - BreamoreBoy
components: - Extension Modules
title: transparent gzip compression in liburl2 -> transparent gzip compression in urllib
2010-11-20 20:17:57r.david.murraylinkissue9500 superseder
2010-08-23 07:47:06orsenthilsetassignee: orsenthil
messages: + msg114726
2010-08-23 07:43:26antializesetmessages: + msg114725
2010-08-22 10:53:38BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114671
versions: + Python 3.2, - Python 2.7
2009-04-22 18:48:53ajaksu2setkeywords: + easy
2009-02-12 17:42:44ajaksu2setnosy: + orsenthil
stage: test needed
type: enhancement
components: + Library (Lib)
versions: + Python 2.7, - Python 2.4
2006-06-19 08:59:09antializecreate