classification
Title: httplib/http.client HTTPConnection._set_hostport() regression
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: cfs-pure, martin.panter, zach.ware
Priority: normal Keywords: patch

Created on 2016-10-26 19:12 by cfs-pure, last changed 2016-10-29 02:48 by martin.panter.

Files
File name Uploaded Description Edit
get_hostport.diff cfs-pure, 2016-10-26 19:12 Fix to HTTPConnection._get_hostport() review
Messages (11)
msg279509 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-26 19:12
Back through the mists of time, there was a change to strip square brackets IPv6 address host literals in HTTPConnection._get_hostport():

https://hg.python.org/cpython/diff/433606e9546c/Lib/httplib.py

However, the code mixed tabs and spaces and was "corected" by:

https://hg.python.org/cpython/diff/9e2b94a3b5dc/Lib/httplib.py

However, the intent was changed in the second diff and brackets won't be stripped.  This causes problems when IPv6 address URL's are used with the requests package:

In [1]: import httplib

In [2]: con = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764%eth0]', 15482)

In [3]: con.request("GET", "/api/api-version")
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
<ipython-input-3-28f1bab26dc1> in <module>()
----> 1 con.request("GET", "/api/api-version")

/usr/lib/python2.7/httplib.pyc in request(self, method, url, body, headers)
    977     def request(self, method, url, body=None, headers={}):
    978         """Send a complete request to the server."""
--> 979         self._send_request(method, url, body, headers)
    980 
    981     def _set_content_length(self, body):

/usr/lib/python2.7/httplib.pyc in _send_request(self, method, url, body, headers)
   1011         for hdr, value in headers.iteritems():
   1012             self.putheader(hdr, value)
-> 1013         self.endheaders(body)
   1014 
   1015     def getresponse(self, buffering=False):

/usr/lib/python2.7/httplib.pyc in endheaders(self, message_body)
    973         else:
    974             raise CannotSendHeader()
--> 975         self._send_output(message_body)
    976 
    977     def request(self, method, url, body=None, headers={}):

/usr/lib/python2.7/httplib.pyc in _send_output(self, message_body)
    833             msg += message_body
    834             message_body = None
--> 835         self.send(msg)
    836         if message_body is not None:
    837             #message_body was not a string (i.e. it is a file) and

/usr/lib/python2.7/httplib.pyc in send(self, data)
    795         if self.sock is None:
    796             if self.auto_open:
--> 797                 self.connect()
    798             else:
    799                 raise NotConnected()

/usr/lib/python2.7/httplib.pyc in connect(self)
    776         """Connect to the host and port specified in __init__."""
    777         self.sock = socket.create_connection((self.host,self.port),
--> 778                                              self.timeout, self.source_address)
    779 
    780         if self._tunnel_host:

/usr/lib/python2.7/socket.pyc in create_connection(address, timeout, source_address)
    551     host, port = address
    552     err = None
--> 553     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    554         af, socktype, proto, canonname, sa = res
    555         sock = None

gaierror: [Errno -2] Name or service not known
msg279511 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-26 19:58
Er, that is HTTPConnection._set_hostport() not _get_hostport()
msg279512 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-26 21:23
You say this is a regression. Can you point to a version that worked as you want? The IPv6 URL handling you pointed out seems to have been added to 2.4b1, and then backported to 2.3.5. I expect that earlier versions didn’t support IPv6 URLs at all.

Also, it does not seem to me that parsing the square bracket form was intended if the port is specified separately. Look at the test cases added in revision 433606e9546c. Perhaps you aren’t encoding the URL correctly, or Requests isn’t parsing the hostname correctly.

Also, according to RFC 6874, it would be more proper to encode the percent sign as %25, so the URL would have [fe80::26a9:37ff:fe00:f764%25eth0]. But I don’t think Python supports this; see Issue 23448.
msg279517 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-26 22:03
I misapplied the term 'regression'.  My intent was to describe how original author's change revision 433606e9546c was refactored to make it perform incorrectly.

Without the scope specifier, the outcome is the same when HTTPConnection is instantiated.  When both the host and port arguments are specified, the square brackets are not stripped and are stored in the host attribute.  When the port number is part of the host argument and the port argument is None, the host attribute does not include the square brackets.  Examples:

Python 2.7.10 (default, Jul 30 2016, 18:31:42) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> con1 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]', 15482)
>>> con1.host, con1.port
('[fe80::26a9:37ff:fe00:f764]', 15482)
>>> con2 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]:15482')
>>> con2.host, con2.port
('fe80::26a9:37ff:fe00:f764', 15482)

Compare with IPv4 behavior:

>>> con3 = httplib.HTTPConnection('127.0.0.1', 15482)
>>> con3.host, con3.port
('127.0.0.1', 15482)
>>> con4 = httplib.HTTPConnection('127.0.0.1:15482')
>>> con4.host, con4.port
('127.0.0.1', 15482)

Calls to con1.request() will fail in socket.py because getaddrinfo will choke on the square brackets.  Which makes sense since HTTPConnection.host is passed on down the stack as-is until it reaches create_connection() in socket.py.

Moving the indent of that if block up one level makes con1 identical to con2.
msg279518 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-26 22:06
Example with patch applied:

Python 2.7.6 (default, Oct 26 2016, 20:33:50) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> con1 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]', 15482)
>>> con1.host, con1.port
('fe80::26a9:37ff:fe00:f764', 15482)
msg279519 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-26 23:17
I tried the original revision 433606e9546c code in Python 2.7 (which still allows mixed tabs and spaces), and it behaves the same as the current version. Thus I suspect revision 9e2b94a3b5dc did not change any behaviour.

What is your use case? Why can’t you just pass the address without square brackets:

>>> con = HTTPConnection("fe80::26a9:37ff:fe00:f764", 15482)
>>> (con.host, con.port)
('fe80::26a9:37ff:fe00:f764', 15482)

How do you come by an IPv6 address in square brackets, but with the port number separate? You can use urlsplit() to extract the “hostname” and “port” attributes from a URL, but in that case “hostname” does not include the square brackets:

>>> urlsplit("//[fe80::26a9:37ff:fe00:f764]:15482").hostname
'fe80::26a9:37ff:fe00:f764'
msg279520 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-27 00:35
Our internal use case is happening through requests via urllib3 for parsing.

Essentially requests is taking the URL, passing it to urllib3 for parsing.  urllib3 is returning a namedtuple of type Url which includes a host and port property which is being fed to httplib.  Essentially:

>>> import urllib3
>>> import httplib
>>> orig_url = 'http://[2620:125:9014:3240:14:240:128:0]:8080/api/python'
>>> u1 = urllib3.util.parse_url(orig_url)
>>> u1
Url(scheme='http', auth=None, host='[2620:125:9014:3240:14:240:128:0]', port=8080, path='/api/python', query=None, fragment=None)
>>> c1 = httplib.HTTPConnection(u1.host, port=u1.port)
>>> c1.host, c1.port
('[2620:125:9014:3240:14:240:128:0]', 8080)
>>> c1.request('GET', '/api/json')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/httplib.py", line 979, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 797, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 778, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
msg279522 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-10-27 03:13
That looks like a bug in urllib3 to me.  The host is *not* '[2620:125:9014:3240:14:240:128:0]', the host is '2620:125:9014:3240:14:240:128:0'; the brackets are merely for disambiguating the port.

Compare to urllib.parse.urlsplit:

>>> from urllib.parse import urlsplit
>>> s = urlsplit('http://[2620:125:9014:3240:14:240:128:0]:8080/api/python')
>>> s.hostname
'2620:125:9014:3240:14:240:128:0'
>>> s.port
8080
>>> s.netloc
'[2620:125:9014:3240:14:240:128:0]:8080'

I'd recommend pursuing this with urllib3, but do take a look at existing IPv6 parsing issues there as a cursory glance at their bug tracker shows several of them.

That said, I don't know of any particular reason not to strip brackets in HTTPConnection anyway.  Brackets cannot be part of a valid hostname, can they?
msg279523 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-27 03:29
Not when passing it to getaddrinfo().
msg279524 - (view) Author: Charles Stephens (cfs-pure) * Date: 2016-10-27 03:30
Yes, I'm working on patching urllib3 to preprocess the host argument to HTTPConnection.  However, it makes sense to strip square brackets regardless.
msg279647 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-10-29 02:48
In general, I would be against stripping square brackets if the host and port have already been separated, e.g. in the address tuple used with socket.create_connection() etc. It feels like unnecessary double processing, which should already have been done at a higher level.

But I guess you could make the argument that adding this feature in HTTPConnection simplifies the API by improving consistency with the case where the port is not specified separately. But I would still say it is not a bug fix nor regression, and should be documented as being added in the next Python version (3.7 atm).
History
Date User Action Args
2018-08-28 13:42:34martin.panterlinkissue34516 superseder
2016-10-29 02:48:34martin.pantersetmessages: + msg279647
2016-10-27 03:30:30cfs-puresetmessages: + msg279524
2016-10-27 03:29:27cfs-puresetmessages: + msg279523
2016-10-27 03:13:52zach.waresetnosy: + zach.ware
messages: + msg279522
2016-10-27 00:35:19cfs-puresetmessages: + msg279520
2016-10-26 23:17:35martin.pantersetmessages: + msg279519
2016-10-26 22:06:11cfs-puresetmessages: + msg279518
2016-10-26 22:03:58cfs-puresetmessages: + msg279517
2016-10-26 21:23:57martin.pantersetversions: - Python 3.3, Python 3.4
nosy: + martin.panter

messages: + msg279512

stage: patch review
2016-10-26 19:58:08cfs-puresetmessages: + msg279511
2016-10-26 19:57:39cfs-puresettitle: httplib/http.client HTTPConnection._get_hostport() regression -> httplib/http.client HTTPConnection._set_hostport() regression
2016-10-26 19:12:47cfs-purecreate