classification
Title: urllib2 needs to remove scope from IPv6 address when creating Host header
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: JonathanGuthrie, gregory.p.smith, martin.panter, ngierman
Priority: normal Keywords:

Created on 2015-02-11 18:40 by ngierman, last changed 2017-01-26 21:22 by JonathanGuthrie.

Messages (3)
msg235762 - (view) Author: Neil Gierman (ngierman) Date: 2015-02-11 18:40
Using a scoped IPv6 address with urllib2 creates an invalid Host header that Apache will not accept.

        IP = "fe80::0000:0000:0000:0001%eth0"
        req = urllib2.Request("http://[" + IP + "]/")
        req.add_header('Content-Type', 'application/json')
        res = urllib2.urlopen(req, json.dumps(data))

Apache will reject the above request because the Host header is "[fe80::0000:0000:0000:0001%eth0]". This behavior was reported to Apache at https://issues.apache.org/bugzilla/show_bug.cgi?id=35122 and the Apache devs will not fix this as there are new RFCs prohibiting scopes in the Host header. Firefox had the same issue and their fix was to strip out the scope from the Host header: https://bugzilla.mozilla.org/show_bug.cgi?id=464162 and http://hg.mozilla.org/mozilla-central/rev/bb80e727c531.

My suggestion is to change urllib2.py's do_request_ method from:

        if not request.has_header('Host'):
            request.add_unredirected_header('Host', sel_host)

to:

        if not request.has_header('Host'):
            request.add_unredirected_header('Host', re.compile(r"%.*$").sub("", sel_host, 1))

I have not tested this patch to urllib2.py however I am now using similar logic in my code to override the Host header when I create my request:

        IP = "fe80::0000:0000:0000:0001%eth0"
        req = urllib2.Request("http://[" + IP + "]/")
        req.add_header('Host', '[' + re.compile(r"%.*").sub("", IP, 1) + ']')
        req.add_header('Content-Type', 'application/json')
        res = urllib2.urlopen(req, json.dumps(data))
msg235768 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-11 21:03
I’m no IPv6 expert, but there seems to be a few standards:

* <https://tools.ietf.org/html/rfc6874> (Feb 2013). Encodes as http://[fe80::1%25eth0]/; says Windows uses this form. Also mentions the unencoded http://[fe80::1%eth0]/ form. Says that the HTTP Host header should not include the scope zone identifier, since it is not necessarily relevant to the server.

* <https://tools.ietf.org/html/draft-sweet-uri-zoneid-01> (Nov 2013). Encodes as http://[v1.fe80::1+eth0]/; says CUPS uses this form. Also acknowledges the RFC %25 form. Says that the Host header _should_ include the scope, to help with servers that send back self-referencing absolute URLs.

Also, I would probably find IP.split('%', 1)[0] easier to read than a regular expression.
msg286334 - (view) Author: Jonathan Guthrie (JonathanGuthrie) Date: 2017-01-26 21:22
Michael Sweet's draft RFC requiring that the scope should be included in the Host line expired in May 2014 and I can't find where it ever went anywhere.  Does anyone have any updated information?
History
Date User Action Args
2017-01-26 21:22:28JonathanGuthriesetnosy: + JonathanGuthrie
messages: + msg286334
2017-01-25 22:23:59gregory.p.smithsetnosy: + gregory.p.smith
2015-02-13 01:27:08demian.brechtsetnosy: - demian.brecht
2015-02-11 21:23:46demian.brechtsetnosy: + demian.brecht
2015-02-11 21:03:41martin.pantersetnosy: + martin.panter
messages: + msg235768
2015-02-11 18:40:23ngiermancreate