This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.request.urlopen fails when userinfo is present in URL
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Windson Yang, orsenthil, xtreak, ytvwld
Priority: normal Keywords:

Created on 2018-09-11 16:33 by ytvwld, last changed 2022-04-11 14:59 by admin.

Messages (9)
msg325022 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-09-11 16:33
Today I tried to access URLs like this one: http://user:1234@example.net:8080.

The result was this:
>>> import urllib.request
>>> urllib.request.urlopen("http://user:1234@example.net:1234/")
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/local/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/usr/local/lib/python3.7/http/client.py", line 928, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1345, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name does not resolve>


At first, I checked my network connection, but that turned out to be okay. Even funnier:

>>> urllib.request.urlopen("http://user:1234@example.net/")     
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/http/client.py", line 877, in _get_hostport
    port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: '1234@example.net'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1345, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1285, in do_open
    h = http_class(host, timeout=req.timeout, **http_conn_args)
  File "/usr/local/lib/python3.7/http/client.py", line 841, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/usr/local/lib/python3.7/http/client.py", line 882, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: '1234@example.net'

So, urllib seems to have problems parsing HTTP URLs which contain a userinfo part. (Requests works.)
msg325130 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-12 10:26
Seems like this is an explicit choice that there is also a test with the URL "user:password@www.python.org" . Ref : https://github.com/python/cpython/blob/731ff68eeef58babdf2b32dc9a73b141760c2be9/Lib/test/test_httplib.py#L640 . You can try basic auth example in https://docs.python.org/3/library/urllib.request.html#examples

Adding the developer here who might have more info on adding this support for urllib .

Thanks
msg325131 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-09-12 10:47
Thank you for your response.

But if this is an explicit choice, I would like to have better exceptions:

>>> from http.client import HTTPConnection
>>> h = HTTPConnection("user:1234@example.net")
    raises http.client.InvalidURL: nonnumeric port: '1234@example.n
et'
>>> h = HTTPConnection("user:1234@example.net:1234")
    doesn't raise any exception.
msg327152 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-10-05 17:07
Hey, this is almost a month old. (Not a problem, really. But I thought, I should bump this.)
msg329549 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-11-09 19:15
Another month has passed, just saying.
msg329586 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-11-10 02:14
First, we can add some check at https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/http/client.py#L871 and raise an error if the URL contains userinfo part. Second, we should catch some exception in urllib.request.urlopen, if we agree, I can create a PR later.
msg329623 - (view) Author: Niklas Sombert (ytvwld) Date: 2018-11-10 15:43
This behaviour would be better than the current one, yes.
msg333434 - (view) Author: Windson Yang (Windson Yang) * Date: 2019-01-11 02:02
I found that Requests library use urllib3 library which looks like ignore the user info part (in request_context https://github.com/urllib3/urllib3/blob/master/src/urllib3/poolmanager.py#L208). Did I miss something or we should also ignore it?
msg334413 - (view) Author: Windson Yang (Windson Yang) * Date: 2019-01-27 03:53
Why requests library didn't raise an error because urllib3 (the library requests using) ignore the auth part right now

> Currently we expect our users to handle authentication headers themselves. It's unfortunate that we silently strip this information though...

The discuss also in https://github.com/urllib3/urllib3/issues/1530

Should we ignore the userinfo part of raising an error here?
History
Date User Action Args
2022-04-11 14:59:05adminsetgithub: 78809
2021-12-10 16:43:46iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.4, Python 3.5, Python 3.6, Python 3.7
2019-01-27 03:53:16Windson Yangsetmessages: + msg334413
2019-01-11 02:02:58Windson Yangsetmessages: + msg333434
2018-11-10 15:43:37ytvwldsetmessages: + msg329623
2018-11-10 02:14:05Windson Yangsetnosy: + Windson Yang
messages: + msg329586
2018-11-09 19:15:22ytvwldsetmessages: + msg329549
2018-10-05 17:07:58ytvwldsetmessages: + msg327152
2018-09-12 10:47:26ytvwldsetmessages: + msg325131
2018-09-12 10:26:54xtreaksetnosy: + orsenthil
messages: + msg325130
2018-09-11 17:54:09xtreaksetnosy: + xtreak
2018-09-11 16:33:16ytvwldcreate