classification
Title: Should urrllib2.urlopen send an Accept-Encoding header?
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: pending Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: dabrahams, demian.brecht, eric.araujo, karlcow, orsenthil
Priority: normal Keywords:

Created on 2010-05-16 14:47 by dabrahams, last changed 2015-04-02 15:32 by demian.brecht.

Messages (5)
msg105870 - (view) Author: Dave Abrahams (dabrahams) Date: 2010-05-16 14:47
According to the RFC, the server is allowed to send back any encoding it likes when no Accept-Encoding header is supplied, but all the examples I can find of urllib2.urlopen usage assume they're getting plain text back.  I think it would be better to inject an Accept-Encoding header when none is explicitly supplied so that nobody else trips over this issue.

See http://support.github.com/discussions/site/1510
msg105937 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-17 20:30
HTTP Ref says that Server can send any encoding, if client does not
specify Accept-Encoding header. But if 'identity' is one of the
encoding that server recognizes (?), then it should send it as
identity, which indicates untransformed content.

I also see in the httplib that Accept-Encoding = 'identity' is added in the
request level to the headers. I shall see what is missing here, if it
is not being sent for all requests.

BTW, I could not figure out the problem you are facing from the url
mentioned. I specifically do not see any interleaving gzip and no-gzip
request behaviours at different points.
msg105959 - (view) Author: Dave Abrahams (dabrahams) Date: 2010-05-18 10:02
How many tests did you run?  My two tests were minutes apart.  I have the feeling that this has something to do with cacheing behavior on the server.
msg183573 - (view) Author: karl (karlcow) * Date: 2013-03-06 02:32
What was the content of http://support.github.com/discussions/site/1510
I can't find it. Is the issue still going on?
msg239926 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-04-02 15:32
This doesn't seem to be an issue in 3.4+, the following headers are injected in a call to urlopen():

GET / HTTP/1.1
Accept-Encoding: identity
Host: example.com
User-Agent: Python-urllib/3.4
Connection: close

However, this is not the same behaviour in 2.7:

GET / HTTP/1.0
Host: example.com
User-Agent: Python-urllib/1.17

That said, I wouldn't see this as a bug but a feature request, so it should be invalid for 2.7.

Setting this to pending to close unless anyone has any objections or further details.
History
Date User Action Args
2015-04-02 15:32:23demian.brechtsetstatus: open -> pending
nosy: + demian.brecht
messages: + msg239926

2013-03-06 02:32:12karlcowsetnosy: + karlcow
messages: + msg183573
2010-12-22 07:48:02eric.araujosetnosy: + eric.araujo

versions: - Python 2.6
2010-05-18 10:02:40dabrahamssetmessages: + msg105959
2010-05-17 20:30:17orsenthilsetmessages: + msg105937
2010-05-16 18:24:46pitrousetassignee: orsenthil

type: behavior
nosy: + orsenthil
versions: + Python 3.1, Python 2.7, Python 3.2
2010-05-16 14:47:09dabrahamscreate