Title: urllib2 needs to remove scope from IPv6 address when creating Host header
Components: Library (Lib) Versions: Python 2.7
Created on 2015-02-11 18:40 by ngierman

msg235762 - (view) Author: Neil Gierman (ngierman) Date: 2015-02-11 18:40
Using a scoped IPv6 address with urllib2 creates an invalid Host header that Apache will not accept.

        IP = "fe80::0000:0000:0000:0001%eth0"
        req = urllib2.Request("http://[" + IP + "]/")
        req.add_header('Content-Type', 'application/json')
        res = urllib2.urlopen(req, json.dumps(data))

Apache will reject the above request because the Host header is "[fe80::0000:0000:0000:0001%eth0]". This behavior was reported to Apache at and the Apache devs will not fix this as there are new RFCs prohibiting scopes in the Host header. Firefox had the same issue and their fix was to strip out the scope from the Host header: and

My suggestion is to change's do_request_ method from:

        if not request.has_header('Host'):
            request.add_unredirected_header('Host', sel_host)


        if not request.has_header('Host'):
            request.add_unredirected_header('Host', re.compile(r"%.*$").sub("", sel_host, 1))

I have not tested this patch to however I am now using similar logic in my code to override the Host header when I create my request:

        IP = "fe80::0000:0000:0000:0001%eth0"
        req = urllib2.Request("http://[" + IP + "]/")
        req.add_header('Host', '[' + re.compile(r"%.*").sub("", IP, 1) + ']')
        req.add_header('Content-Type', 'application/json')
        res = urllib2.urlopen(req, json.dumps(data))
msg235768 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-11 21:03
I’m no IPv6 expert, but there seems to be a few standards:

* <> (Feb 2013). Encodes as http://[fe80::1%25eth0]/; says Windows uses this form. Also mentions the unencoded http://[fe80::1%eth0]/ form. Says that the HTTP Host header should not include the scope zone identifier, since it is not necessarily relevant to the server.

* <> (Nov 2013). Encodes as http://[v1.fe80::1+eth0]/; says CUPS uses this form. Also acknowledges the RFC %25 form. Says that the Host header _should_ include the scope, to help with servers that send back self-referencing absolute URLs.

Also, I would probably find IP.split('%', 1)[0] easier to read than a regular expression.
msg286334 - (view) Author: Jonathan Guthrie (JonathanGuthrie) Date: 2017-01-26 21:22
Michael Sweet's draft RFC requiring that the scope should be included in the Host line expired in May 2014 and I can't find where it ever went anywhere.  Does anyone have any updated information?
