classification
Title: urllib.request.urlopen fails when userinfo is present in URL
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: orsenthil, xtreak, ytvwld
Priority: normal Keywords:

Created on 2018-09-11 16:33 by ytvwld, last changed 2018-09-12 10:47 by ytvwld.

Messages (3)
msg325022 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-09-11 16:33
Today I tried to access URLs like this one: http://user:1234@example.net:8080.

The result was this:
>>> import urllib.request
>>> urllib.request.urlopen("http://user:1234@example.net:1234/")
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/local/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/usr/local/lib/python3.7/http/client.py", line 928, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1345, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name does not resolve>


At first, I checked my network connection, but that turned out to be okay. Even funnier:

>>> urllib.request.urlopen("http://user:1234@example.net/")     
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/http/client.py", line 877, in _get_hostport
    port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: '1234@example.net'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1345, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1285, in do_open
    h = http_class(host, timeout=req.timeout, **http_conn_args)
  File "/usr/local/lib/python3.7/http/client.py", line 841, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/usr/local/lib/python3.7/http/client.py", line 882, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: '1234@example.net'

So, urllib seems to have problems parsing HTTP URLs which contain a userinfo part. (Requests works.)
msg325130 - (view) Author: Karthikeyan Singaravelan (xtreak) * Date: 2018-09-12 10:26
Seems like this is an explicit choice that there is also a test with the URL "user:password@www.python.org" . Ref : https://github.com/python/cpython/blob/731ff68eeef58babdf2b32dc9a73b141760c2be9/Lib/test/test_httplib.py#L640 . You can try basic auth example in https://docs.python.org/3/library/urllib.request.html#examples

Adding the developer here who might have more info on adding this support for urllib .

Thanks
msg325131 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-09-12 10:47
Thank you for your response.

But if this is an explicit choice, I would like to have better exceptions:

>>> from http.client import HTTPConnection
>>> h = HTTPConnection("user:1234@example.net")
    raises http.client.InvalidURL: nonnumeric port: '1234@example.n
et'
>>> h = HTTPConnection("user:1234@example.net:1234")
    doesn't raise any exception.
History
Date User Action Args
2018-09-12 10:47:26ytvwldsetmessages: + msg325131
2018-09-12 10:26:54xtreaksetnosy: + orsenthil
messages: + msg325130
2018-09-11 17:54:09xtreaksetnosy: + xtreak
2018-09-11 16:33:16ytvwldcreate