Issue28539
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-10-26 19:12 by cfs-pure, last changed 2022-04-11 14:58 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
get_hostport.diff | cfs-pure, 2016-10-26 19:12 | Fix to HTTPConnection._get_hostport() | review |
Messages (11) | |||
---|---|---|---|
msg279509 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-26 19:12 | |
Back through the mists of time, there was a change to strip square brackets IPv6 address host literals in HTTPConnection._get_hostport(): https://hg.python.org/cpython/diff/433606e9546c/Lib/httplib.py However, the code mixed tabs and spaces and was "corected" by: https://hg.python.org/cpython/diff/9e2b94a3b5dc/Lib/httplib.py However, the intent was changed in the second diff and brackets won't be stripped. This causes problems when IPv6 address URL's are used with the requests package: In [1]: import httplib In [2]: con = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764%eth0]', 15482) In [3]: con.request("GET", "/api/api-version") --------------------------------------------------------------------------- gaierror Traceback (most recent call last) <ipython-input-3-28f1bab26dc1> in <module>() ----> 1 con.request("GET", "/api/api-version") /usr/lib/python2.7/httplib.pyc in request(self, method, url, body, headers) 977 def request(self, method, url, body=None, headers={}): 978 """Send a complete request to the server.""" --> 979 self._send_request(method, url, body, headers) 980 981 def _set_content_length(self, body): /usr/lib/python2.7/httplib.pyc in _send_request(self, method, url, body, headers) 1011 for hdr, value in headers.iteritems(): 1012 self.putheader(hdr, value) -> 1013 self.endheaders(body) 1014 1015 def getresponse(self, buffering=False): /usr/lib/python2.7/httplib.pyc in endheaders(self, message_body) 973 else: 974 raise CannotSendHeader() --> 975 self._send_output(message_body) 976 977 def request(self, method, url, body=None, headers={}): /usr/lib/python2.7/httplib.pyc in _send_output(self, message_body) 833 msg += message_body 834 message_body = None --> 835 self.send(msg) 836 if message_body is not None: 837 #message_body was not a string (i.e. it is a file) and /usr/lib/python2.7/httplib.pyc in send(self, data) 795 if self.sock is None: 796 if self.auto_open: --> 797 self.connect() 798 else: 799 raise NotConnected() /usr/lib/python2.7/httplib.pyc in connect(self) 776 """Connect to the host and port specified in __init__.""" 777 self.sock = socket.create_connection((self.host,self.port), --> 778 self.timeout, self.source_address) 779 780 if self._tunnel_host: /usr/lib/python2.7/socket.pyc in create_connection(address, timeout, source_address) 551 host, port = address 552 err = None --> 553 for res in getaddrinfo(host, port, 0, SOCK_STREAM): 554 af, socktype, proto, canonname, sa = res 555 sock = None gaierror: [Errno -2] Name or service not known |
|||
msg279511 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-26 19:58 | |
Er, that is HTTPConnection._set_hostport() not _get_hostport() |
|||
msg279512 - (view) | Author: Martin Panter (martin.panter) * | Date: 2016-10-26 21:23 | |
You say this is a regression. Can you point to a version that worked as you want? The IPv6 URL handling you pointed out seems to have been added to 2.4b1, and then backported to 2.3.5. I expect that earlier versions didn’t support IPv6 URLs at all. Also, it does not seem to me that parsing the square bracket form was intended if the port is specified separately. Look at the test cases added in revision 433606e9546c. Perhaps you aren’t encoding the URL correctly, or Requests isn’t parsing the hostname correctly. Also, according to RFC 6874, it would be more proper to encode the percent sign as %25, so the URL would have [fe80::26a9:37ff:fe00:f764%25eth0]. But I don’t think Python supports this; see Issue 23448. |
|||
msg279517 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-26 22:03 | |
I misapplied the term 'regression'. My intent was to describe how original author's change revision 433606e9546c was refactored to make it perform incorrectly. Without the scope specifier, the outcome is the same when HTTPConnection is instantiated. When both the host and port arguments are specified, the square brackets are not stripped and are stored in the host attribute. When the port number is part of the host argument and the port argument is None, the host attribute does not include the square brackets. Examples: Python 2.7.10 (default, Jul 30 2016, 18:31:42) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import httplib >>> con1 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]', 15482) >>> con1.host, con1.port ('[fe80::26a9:37ff:fe00:f764]', 15482) >>> con2 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]:15482') >>> con2.host, con2.port ('fe80::26a9:37ff:fe00:f764', 15482) Compare with IPv4 behavior: >>> con3 = httplib.HTTPConnection('127.0.0.1', 15482) >>> con3.host, con3.port ('127.0.0.1', 15482) >>> con4 = httplib.HTTPConnection('127.0.0.1:15482') >>> con4.host, con4.port ('127.0.0.1', 15482) Calls to con1.request() will fail in socket.py because getaddrinfo will choke on the square brackets. Which makes sense since HTTPConnection.host is passed on down the stack as-is until it reaches create_connection() in socket.py. Moving the indent of that if block up one level makes con1 identical to con2. |
|||
msg279518 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-26 22:06 | |
Example with patch applied: Python 2.7.6 (default, Oct 26 2016, 20:33:50) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import httplib >>> con1 = httplib.HTTPConnection('[fe80::26a9:37ff:fe00:f764]', 15482) >>> con1.host, con1.port ('fe80::26a9:37ff:fe00:f764', 15482) |
|||
msg279519 - (view) | Author: Martin Panter (martin.panter) * | Date: 2016-10-26 23:17 | |
I tried the original revision 433606e9546c code in Python 2.7 (which still allows mixed tabs and spaces), and it behaves the same as the current version. Thus I suspect revision 9e2b94a3b5dc did not change any behaviour. What is your use case? Why can’t you just pass the address without square brackets: >>> con = HTTPConnection("fe80::26a9:37ff:fe00:f764", 15482) >>> (con.host, con.port) ('fe80::26a9:37ff:fe00:f764', 15482) How do you come by an IPv6 address in square brackets, but with the port number separate? You can use urlsplit() to extract the “hostname” and “port” attributes from a URL, but in that case “hostname” does not include the square brackets: >>> urlsplit("//[fe80::26a9:37ff:fe00:f764]:15482").hostname 'fe80::26a9:37ff:fe00:f764' |
|||
msg279520 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-27 00:35 | |
Our internal use case is happening through requests via urllib3 for parsing. Essentially requests is taking the URL, passing it to urllib3 for parsing. urllib3 is returning a namedtuple of type Url which includes a host and port property which is being fed to httplib. Essentially: >>> import urllib3 >>> import httplib >>> orig_url = 'http://[2620:125:9014:3240:14:240:128:0]:8080/api/python' >>> u1 = urllib3.util.parse_url(orig_url) >>> u1 Url(scheme='http', auth=None, host='[2620:125:9014:3240:14:240:128:0]', port=8080, path='/api/python', query=None, fragment=None) >>> c1 = httplib.HTTPConnection(u1.host, port=u1.port) >>> c1.host, c1.port ('[2620:125:9014:3240:14:240:128:0]', 8080) >>> c1.request('GET', '/api/json') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/httplib.py", line 979, in request self._send_request(method, url, body, headers) File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 975, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 835, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 797, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 778, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): socket.gaierror: [Errno -2] Name or service not known |
|||
msg279522 - (view) | Author: Zachary Ware (zach.ware) * | Date: 2016-10-27 03:13 | |
That looks like a bug in urllib3 to me. The host is *not* '[2620:125:9014:3240:14:240:128:0]', the host is '2620:125:9014:3240:14:240:128:0'; the brackets are merely for disambiguating the port. Compare to urllib.parse.urlsplit: >>> from urllib.parse import urlsplit >>> s = urlsplit('http://[2620:125:9014:3240:14:240:128:0]:8080/api/python') >>> s.hostname '2620:125:9014:3240:14:240:128:0' >>> s.port 8080 >>> s.netloc '[2620:125:9014:3240:14:240:128:0]:8080' I'd recommend pursuing this with urllib3, but do take a look at existing IPv6 parsing issues there as a cursory glance at their bug tracker shows several of them. That said, I don't know of any particular reason not to strip brackets in HTTPConnection anyway. Brackets cannot be part of a valid hostname, can they? |
|||
msg279523 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-27 03:29 | |
Not when passing it to getaddrinfo(). |
|||
msg279524 - (view) | Author: Charles Stephens (cfs-pure) * | Date: 2016-10-27 03:30 | |
Yes, I'm working on patching urllib3 to preprocess the host argument to HTTPConnection. However, it makes sense to strip square brackets regardless. |
|||
msg279647 - (view) | Author: Martin Panter (martin.panter) * | Date: 2016-10-29 02:48 | |
In general, I would be against stripping square brackets if the host and port have already been separated, e.g. in the address tuple used with socket.create_connection() etc. It feels like unnecessary double processing, which should already have been done at a higher level. But I guess you could make the argument that adding this feature in HTTPConnection simplifies the API by improving consistency with the case where the port is not specified separately. But I would still say it is not a bug fix nor regression, and should be documented as being added in the next Python version (3.7 atm). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:38 | admin | set | github: 72725 |
2018-08-28 13:42:34 | martin.panter | link | issue34516 superseder |
2016-10-29 02:48:34 | martin.panter | set | messages: + msg279647 |
2016-10-27 03:30:30 | cfs-pure | set | messages: + msg279524 |
2016-10-27 03:29:27 | cfs-pure | set | messages: + msg279523 |
2016-10-27 03:13:52 | zach.ware | set | nosy:
+ zach.ware messages: + msg279522 |
2016-10-27 00:35:19 | cfs-pure | set | messages: + msg279520 |
2016-10-26 23:17:35 | martin.panter | set | messages: + msg279519 |
2016-10-26 22:06:11 | cfs-pure | set | messages: + msg279518 |
2016-10-26 22:03:58 | cfs-pure | set | messages: + msg279517 |
2016-10-26 21:23:57 | martin.panter | set | versions:
- Python 3.3, Python 3.4 nosy: + martin.panter messages: + msg279512 stage: patch review |
2016-10-26 19:58:08 | cfs-pure | set | messages: + msg279511 |
2016-10-26 19:57:39 | cfs-pure | set | title: httplib/http.client HTTPConnection._get_hostport() regression -> httplib/http.client HTTPConnection._set_hostport() regression |
2016-10-26 19:12:47 | cfs-pure | create |