Issue34777
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-09-23 16:07 by tuxcell, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
header-illegal.py | tuxcell, 2018-09-23 16:07 | example of urllib.request with incorrect headers |
Messages (6) | |||
---|---|---|---|
msg326162 - (view) | Author: Jose Gama (tuxcell) * | Date: 2018-09-23 16:07 | |
It is possible to use urllib.request defining a header that can be junk in some cases and still get the contents without any warning or error. The behavior depends on the URL and also on the header. |
|||
msg326326 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2018-09-25 06:38 | |
Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838 Some sample requests from curl with invalid headers. curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv' { "args": {}, "headers": { "Accept": "*/*", "Akcept": "tekst/csv", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst' { "args": {}, "headers": { "Accept": "tekst", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code. Thanks |
|||
msg326592 - (view) | Author: Jose Gama (tuxcell) * | Date: 2018-09-27 20:36 | |
Thank you for the quick reply. You are correct about the difficulties of using a universally accepted list.This is one example that generates errors on the server side. Just for the record. #!/usr/bin/env python3 from urllib.request import Request, urlopenfrom urllib.error import URLError # process SSB dataurl1 = 'https://raw.githubusercontent.com/mapnik/test-data/master/csv/points.csv'url2 = 'https://gitlab.cncf.ci/kubernetes/kubernetes/raw/c69582dffba33e9f1c08ff2fc67924ea90f1448c/test/test_owners.csv'url3 = 'http://data.ssb.no/api/klass/v1/classifications/131/changes?from=2016-01-01&to=9999-12-31'headers1 = {'Accept': 'text/csv'}headers2 = {'Akcept': 'text/csv'}headers3 = {'Accept': 'tekst/cxv'}headers4 = {'Accept': '1234'}req = Request(url3, headers=headers4)resp = urlopen(req)content = resp.read().decode(resp.headers.get_content_charset()) # get the character encoding from the server responseprint(content) '''req = Request(url3, headers=headers3) urllib.error.HTTPError: HTTP Error 500: Internal Server Error req = Request(url3, headers=headers4) urllib.error.HTTPError: HTTP Error 406: Not Acceptable''' On Tuesday, September 25, 2018, 8:38:26 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote: Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment: Thanks for the report. I tried similar requests and it works this way for other tools like curl since Akcept could be a custom header in some use cases though it could be a typo in this context. There is no predefined set of media types that we need to validate as far as I can see from https://tools.ietf.org/html/rfc2616#section-14.1 and it depends on the server configuration to do validation. It's hard for Python to maintain a list of acceptable MIME types for validation across releases. A list of registered MIME types that is updated periodically : https://www.iana.org/assignments/media-types/media-types.xhtml and RFC for registration : https://tools.ietf.org/html/rfc6838 Some sample requests from curl with invalid headers. curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Akcept: tekst/csv' { "args": {}, "headers": { "Accept": "*/*", "Akcept": "tekst/csv", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } curl -X GET https://httpbin.org/get -H 'Authorization: Token bc23f14356c114a8ffa319773583426878b7b37f' -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -H 'Accept: tekst' { "args": {}, "headers": { "Accept": "tekst", "Authorization": "Token bc23f14356c114a8ffa319773583426878b7b37f", "Cache-Control": "no-cache", "Connection": "close", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.37.1" }, "origin": "182.73.135.26", "url": "https://httpbin.org/get" } Feel free to add in if I am missing something here but I think it's hard for Python to maintain the updated list and adding warning/error might break someone's code. Thanks ---------- nosy: +xtreak _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34777> _______________________________________ |
|||
msg326604 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2018-09-28 05:21 | |
Thanks for the details. Each server behaves differently for these headers which depends on the server configuration and using other client like curl will also return the same result as Python does. So I would propose closing it as not a bug since there is no bug with Python and it behaves like other clients do. Thanks again for the report! |
|||
msg326618 - (view) | Author: Jose Gama (tuxcell) * | Date: 2018-09-28 09:29 | |
Yes, I agree, it's not a bug.This note might help other people who run into the same questions, particularly with error handling.Thank you! On Friday, September 28, 2018, 7:21:03 AM GMT+2, Karthikeyan Singaravelan <report@bugs.python.org> wrote: Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment: Thanks for the details. Each server behaves differently for these headers which depends on the server configuration and using other client like curl will also return the same result as Python does. So I would propose closing it as not a bug since there is no bug with Python and it behaves like other clients do. Thanks again for the report! ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34777> _______________________________________ |
|||
msg326625 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2018-09-28 10:09 | |
Sure, thanks for the confirmation. Closing it as not a bug. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:06 | admin | set | github: 78958 |
2018-09-28 10:09:40 | xtreak | set | status: open -> closed resolution: not a bug messages: + msg326625 stage: resolved |
2018-09-28 09:29:52 | tuxcell | set | messages: + msg326618 |
2018-09-28 05:21:02 | xtreak | set | messages: + msg326604 |
2018-09-27 20:36:23 | tuxcell | set | messages: + msg326592 |
2018-09-25 06:38:24 | xtreak | set | nosy:
+ xtreak messages: + msg326326 |
2018-09-23 16:07:48 | tuxcell | create |