classification
Title: urllib.request accepts anything as a header parameter for some URLs
Type: behavior Stage: resolved
Components: IO Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: tuxcell, xtreak
Priority: normal Keywords:

Created on 2018-09-23 16:07 by tuxcell, last changed 2018-09-28 10:09 by xtreak. This issue is now closed.

Files
File name Uploaded Description Edit
header-illegal.py tuxcell, 2018-09-23 16:07 example of urllib.request with incorrect headers
Messages (6)
msg326162 - (view) Author: Jose Gama (tuxcell) * Date: 2018-09-23 16:07
It is possible to use urllib.request defining a header that can be junk in some cases and still get the contents without any warning or error.
The behavior depends on the URL and also on the header.
msg326326 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-25 06:38
Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a  typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838

Some sample requests from curl with invalid headers.

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv'
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Akcept": "tekst/csv",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst'
{
  "args": {},
  "headers": {
    "Accept": "tekst",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code.

Thanks
msg326592 - (view) Author: Jose Gama (tuxcell) * Date: 2018-09-27 20:36
Thank you for the quick reply. You are correct about the difficulties of using a universally accepted list.This is one example that generates errors on the server side. Just for the record.

#!/usr/bin/env python3
from urllib.request import Request, urlopenfrom urllib.error import URLError
# process SSB dataurl1 = 'https://raw.githubusercontent.com/mapnik/test-data/master/csv/points.csv'url2 = 'https://gitlab.cncf.ci/kubernetes/kubernetes/raw/c69582dffba33e9f1c08ff2fc67924ea90f1448c/test/test_owners.csv'url3 = 'http://data.ssb.no/api/klass/v1/classifications/131/changes?from=2016-01-01&to=9999-12-31'headers1 = {'Accept': 'text/csv'}headers2 = {'Akcept': 'text/csv'}headers3 = {'Accept': 'tekst/cxv'}headers4 = {'Accept': '1234'}req = Request(url3, headers=headers4)resp = urlopen(req)content =  resp.read().decode(resp.headers.get_content_charset()) # get the character encoding from the server responseprint(content)
'''req = Request(url3, headers=headers3)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

req = Request(url3, headers=headers4)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable'''

    On Tuesday, September 25, 2018, 8:38:26 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote:  

Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:

Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a  typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838

Some sample requests from curl with invalid headers.

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv'
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Akcept": "tekst/csv",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst'
{
  "args": {},
  "headers": {
    "Accept": "tekst",
    "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f",
    "Cache-Control": "no-cache",
    "Connection": "close",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.37.1"
  },
  "origin": "182.73.135.26",
  "url": "https://httpbin.org/get"
}

Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code.

Thanks

----------
nosy: +xtreak

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34777>
_______________________________________
msg326604 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-28 05:21
Thanks for the details. Each server behaves differently for these headers which depends on the server configuration and using other client like curl will also return the same result as Python does. So I would propose closing it as not a bug since there is no bug with Python and it behaves like other clients do.

Thanks again for the report!
msg326618 - (view) Author: Jose Gama (tuxcell) * Date: 2018-09-28 09:29
Yes, I agree, it's not a bug.This note might help other people who run into the same questions, particularly with error handling.Thank you!
    On Friday, September 28, 2018, 7:21:03 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote:  

Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:

Thanks for the details. Each server behaves differently for these headers which depends on the server configuration and using other client like curl will also return the same result as Python does. So I would propose closing it as not a bug since there is no bug with Python and it behaves like other clients do.

Thanks again for the report!

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue34777>
_______________________________________
msg326625 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-28 10:09
Sure, thanks for the confirmation. Closing it as not a bug.
History
Date User Action Args
2018-09-28 10:09:40xtreaksetstatus: open -> closed
resolution: not a bug
messages: + msg326625

stage: resolved
2018-09-28 09:29:52tuxcellsetmessages: + msg326618
2018-09-28 05:21:02xtreaksetmessages: + msg326604
2018-09-27 20:36:23tuxcellsetmessages: + msg326592
2018-09-25 06:38:24xtreaksetnosy: + xtreak
messages: + msg326326
2018-09-23 16:07:48tuxcellcreate