urllib doesn't put Accept: / in the headers #66640

rhettinger · 2014-09-20T23:50:32Z

BPO	22450
Nosy	@rhettinger, @orsenthil, @pitrou, @vadmium, @Lukasa, @kennethreitz
Files	accept.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-09-09.23:46:16.217>
created_at = <Date 2014-09-20.23:50:32.347>
labels = ['type-bug']
title = "urllib doesn't put Accept: */* in the headers"
updated_at = <Date 2016-09-09.23:48:48.426>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2016-09-09.23:48:48.426>
actor = 'orsenthil'
assignee = 'none'
closed = True
closed_date = <Date 2016-09-09.23:46:16.217>
closer = 'rhettinger'
components = []
creation = <Date 2014-09-20.23:50:32.347>
creator = 'rhettinger'
dependencies = []
files = ['36673']
hgrepos = []
issue_num = 22450
keywords = ['patch']
message_count = 19.0
messages = ['227194', '227195', '227196', '227197', '227198', '240468', '253813', '253828', '253834', '253906', '273968', '273970', '273982', '275474', '275487', '275489', '275498', '275499', '275501']
nosy_count = 8.0
nosy_names = ['rhettinger', 'orsenthil', 'pitrou', 'Arfrever', 'python-dev', 'martin.panter', 'Lukasa', 'kennethreitz']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue22450'
versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

rhettinger · 2014-09-20T23:50:32Z

The use of urllib for REST APIs is impaired in the absence of a "Accept: */*" header such as that added automatically by the requests package or by the CURL command-line tool.

# Example that gets an incorrect result due to the missing header
import urllib
print urllib.urlopen('http://graph.facebook.com/raymondh').headers['Content-Type']

# Equivalent call using CURL
$ curl -v http://graph.facebook.com/raymondh
...

Connected to graph.facebook.com (31.13.75.1) port 80 (#0)

GET /raymondh HTTP/1.1
User-Agent: curl/7.30.0
Host: graph.facebook.com
Accept: */*

orsenthil · 2014-09-21T00:00:02Z

Patch looks good. Will need similar addition in urllib2 and inclusion of tests.

pitrou · 2014-09-21T00:00:21Z

Can you explain how the result is incorrect?

>>> f = urllib.request.urlopen('http://graph.facebook.com/raymondh')
>>> json.loads(f.read().decode())
{'link': 'https://www.facebook.com/raymondh', 'id': '562805507', 'last_name': 'Hettinger', 'gender': 'male', 'first_name': 'Raymond', 'name': 'Raymond Hettinger', 'locale': 'en_US', 'username': 'raymondh'}

orsenthil · 2014-09-21T00:03:53Z

Well, the result with loading using json will be same. but without sending Accept */. The content-type returned is text/javascript; charset=UTF-8 and with sending of Accept */ the content-type is set to application/json; charset=UTF-8 (which is more desirable).

pitrou · 2014-09-21T00:07:18Z

The content-type returned is text/javascript; charset=UTF-8 and with
sending of Accept */* the content-type is set to application/json;
charset=UTF-8 (which is more desirable).

Is that a bug in urllib, or in Facebook's HTTP implementation?
Frankly, we shouldn't jump to conclusions just because one specific use case is made better by this. Forcing an accept header may totally change the output of other servers and break existing uses.

(and besides, the content-type header is unimportant when you know what to expect, which is normally the case when calling an API)

vadmium · 2015-04-11T10:56:55Z

The RFC <https://tools.ietf.org/html/rfc7231#page-39\> says “A request without any Accept header field implies that the user agent will accept any media type in response”, which sounds the same as “Accept: */*”. I don’t understand why adding it should make a real difference.

If you really desire only application/json, you should probably include “Accept: application/json” in the request. Otherwise, it would probably be more robust to make your program accept both types. I have come across the same deal with application/atom+xml vs text/xml vs application/xml.

vadmium · 2015-10-31T22:33:22Z

I propose rejecting this one, in favour of the caller adding their own “Accept: */*” (or more preferably, “Accept: application/json”) header. What do you think, Raymond or Senthil?

rhettinger · 2015-11-01T07:18:04Z

What do you think, Raymond

Before dismissing this, we should get a better understanding of why "Accept: */*" is so widely used in practice.

Here's what we know so far:

The header made a difference to the Facebook Graph API.
Curl (a minimalist) includes "Accept: */*", Host, and User-Agent.
Firefox includes "*/*" at the end of its list of acceptable types.
Kenneth Reitz's requests module uses "Accept: */*" by default.
The poolmanager in urllib3 uses "Accept: */*" by default and has a comment that that and the "Host" header are both needed by proxies.
I'm also seeing "Accept: */*" in book examples as well. See https://books.google.com/books?id=fVuWayXLdYIC&pg=PA22 and http://doc.bonfire-project.eu/R1/api/example-session.html

vadmium · 2015-11-01T09:30:43Z

According to all the HTTP 1.1 RFCs, having */* at the end means you accept any other content type if none of the higher priority ones are available (otherwise you risk a 406 Not Acceptable error). So that explains why Firefox has */* tacked on.

Requests copied from Curl: <kennethreitz/requests@6140fac\>. Similarly, it is in urllib3 “because that’s what cURL had by default”. Brief discussion at <https://github.com/shazow/urllib3/pull/93#issuecomment-8209904\>, where they decided to leave things as they already were.

So all roads seem to lead to Curl. Curl’s “initial revision” (Dec 1999) had “Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */”, which was changed to “Accept: */” in <curl/curl@93e32e6\> in 2004. I don’t see any reasons given. I just left a question on Git Hub about this, so maybe we might get some sort of answer.

Wget also includes “Accept: */*”. But it gives no explanations either, and it was present right from the “initial revision” also in Dec 1999 (presumably Source Forge started about then).

vadmium · 2015-11-02T11:27:45Z

The Curl programmer replied basically saying there was no scientific reason, but since Curl was previously sending a custom Accept header, it was safer to leave a bare-bones Accept header in than completely remove it. Plus he thought it might be slighly more compatible with websites.

rhettinger · 2016-08-30T21:17:56Z

Update: After more research, I learned that while 'Accept: */*' should not have an effect on the origin webserver, it can and does have an effect on proxy servers.

Origin servers are allowed to vary the content-type of responses when given different Accept headers. When they do so, they should also send "Vary: Accept".

Proxy servers such as NGinx and Varnish respond to the "Vary: Accept" by caching the different responses using a combination of url and the accept header as the cache key. If the request has 'Accept: */', then the cache lookup returns the same result as if the 'Accept: */' had been passed directly to the server. However, if the Accept header is omitted, the proxy cache can return any of the cached responses (typically the most recent, regardless of content-type).

Accordingly, it is a good practice to include 'Accept: */' in the request so that you get a consistent result (what the server would have returned) rather than the inconsistent and unpredictable content-types you would receive in the absence of the Accept header. I believe that is why the other tools and book examples use 'Accept: */' even though the origin wouldn't care.

rhettinger · 2016-08-30T21:22:00Z

Putting it another way: To an origin server, 'Accept: */' means it can return anything it wants. To a proxy server, the absence of an accept header means in can return anything it has cached (possibly different from what the origin server would have returned). In contract, to a proxy server, 'Accept: */' means return exactly what the origin server would have returned with the same headers.

vadmium · 2016-08-31T00:33:27Z

“Proxy servers such as NGinx and Varnish: . . . if the Accept header is omitted, the proxy cache can return any of the cached responses.”

This is not really my area of expertise, but this behaviour is inconsistent with my understanding of how Accept and Vary are supposed to work in general. I would expect a cache to treat a missing Accept field as a separate “value” that does not match any specific Accept value.

See <https://www.w3.org/mid/20040223204041.GA32719@mail.shareable.org\>. Also, what about a server that sets “Vary: Cookie”, to send a response that depends on whether the user has already seen the page. Do these NGinx and Varnish caches respond with a random response if Cookie is missing?

I still think if you care about the media type, it is better practice to specify what types you want with a more explicit Accept value. And if you don’t care about the media type, the NGinx/Varnish behaviour may not be a problem anyway.

Lukasa · 2016-09-09T22:24:14Z

So, leaping in on the Requests side of things for a moment, two notes. Firstly: copying curl is rarely a bad thing to do, especially for a behaviour curl has had for a long time.

However, in this case the stronger argument is that just because the RFCs say that Accept: */* is implied doesn't mean it can safely be omitted. In practice, origin servers behave unexpectedly when the header is omitted, and in general behave more predictable when it is emitted. For that reason, it should be added by Python's standard library.

HTTP/1.1 is a protocol where "as deployed" means much more than "as specified", sadly.

vadmium · 2016-09-09T23:09:30Z

I’m still not convinced. But my argument about the user specifying Accept if they care about the media type works both ways, so I am not that fussed if others want to make the change.

Are there any examples of servers that behave worse than the application/json vs text/json example? E.g. returning XML vs JSON or something?

python-dev · 2016-09-09T23:24:36Z

New changeset e84105b48436 by Raymond Hettinger in branch '2.7':
Issue bpo-22450: Use "Accept: */*" in the default headers for urllib
https://hg.python.org/cpython/rev/e84105b48436

kennethreitz · 2016-09-09T23:43:29Z

I fully second Corey's comment.

python-dev · 2016-09-09T23:45:27Z

New changeset 00da8bfa2a60 by Raymond Hettinger in branch '3.5':
Issue bpo-22450: Use "Accept: */*" in the default headers for urllib.request
https://hg.python.org/cpython/rev/00da8bfa2a60

orsenthil · 2016-09-09T23:48:48Z

@martin, I weight in 'curl's behavior for de-facto things that differ slightly from standards. It's simply what folks have gotten used to, and sometimes expect.

@Raymond, unit-tests will be a good addition too.

rhettinger added the type-bug An unexpected behavior, bug, or error label Sep 20, 2014

rhettinger closed this as completed Sep 9, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib doesn't put Accept: / in the headers #66640

urllib doesn't put Accept: / in the headers #66640

rhettinger commented Sep 20, 2014

rhettinger commented Sep 20, 2014

orsenthil commented Sep 21, 2014

pitrou commented Sep 21, 2014

orsenthil commented Sep 21, 2014

pitrou commented Sep 21, 2014

vadmium commented Apr 11, 2015

vadmium commented Oct 31, 2015

rhettinger commented Nov 1, 2015

vadmium commented Nov 1, 2015

vadmium commented Nov 2, 2015

rhettinger commented Aug 30, 2016

rhettinger commented Aug 30, 2016

vadmium commented Aug 31, 2016

Lukasa mannequin commented Sep 9, 2016

vadmium commented Sep 9, 2016

python-dev mannequin commented Sep 9, 2016

kennethreitz commented Sep 9, 2016

python-dev mannequin commented Sep 9, 2016

orsenthil commented Sep 9, 2016

urllib doesn't put Accept: */* in the headers #66640

urllib doesn't put Accept: */* in the headers #66640

Comments

rhettinger commented Sep 20, 2014

rhettinger commented Sep 20, 2014

orsenthil commented Sep 21, 2014

pitrou commented Sep 21, 2014

orsenthil commented Sep 21, 2014

pitrou commented Sep 21, 2014

vadmium commented Apr 11, 2015

vadmium commented Oct 31, 2015

rhettinger commented Nov 1, 2015

vadmium commented Nov 1, 2015

vadmium commented Nov 2, 2015

rhettinger commented Aug 30, 2016

rhettinger commented Aug 30, 2016

vadmium commented Aug 31, 2016

Lukasa mannequin commented Sep 9, 2016

vadmium commented Sep 9, 2016

python-dev mannequin commented Sep 9, 2016

kennethreitz commented Sep 9, 2016

python-dev mannequin commented Sep 9, 2016

orsenthil commented Sep 9, 2016

urllib doesn't put Accept: / in the headers #66640

urllib doesn't put Accept: / in the headers #66640