This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pip: urllib3 does not encode userinfo section of URL with authentication credentials
Type: behavior Stage: resolved
Components: Demos and Tools Versions: Python 3.4
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: Marcus.Smith, berker.peksag, demian.brecht, dstufft, leotan, martin.panter, ncoghlan, paul.moore
Priority: normal Keywords:

Created on 2015-02-24 21:58 by leotan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg236550 - (view) Author: Leonardo Tancredi (leotan) Date: 2015-02-24 21:57
I was running pip install with the --proxy switch to authenticate to a proxy server with user "user" and password "pass?word", when I noticed it fails. It seems to fail when the password contains some special characters, v.g., ? and #.

Here's the exception I saw:

  Exception:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/req/req_set.py", line 333, in prepare_files
      upgrade=self.upgrade,
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/index.py", line 305, in find_requirement
      page = self._get_page(main_index_url, req)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/index.py", line 783, in _get_page
      return HTMLPage.get_page(link, req, session=self.session)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/index.py", line 872, in get_page
      "Cache-Control": "max-age=600",
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/sessions.py", line 473, in get
      return self.request('GET', url, **kwargs)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/download.py", line 365, in request
      return super(PipSession, self).request(method, url, *args, **kwargs)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/sessions.py", line 461, in request
      resp = self.send(prep, **send_kwargs)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/sessions.py", line 573, in send
      r = adapter.send(request, **kwargs)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/cachecontrol/adapter.py", line 43, in send
      resp = super(CacheControlAdapter, self).send(request, **kw)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/adapters.py", line 337, in send
      conn = self.get_connection(request.url, proxies)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/adapters.py", line 245, in get_connection
      proxy_manager = self.proxy_manager_for(proxy)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/adapters.py", line 155, in proxy_manager_for
      **proxy_kwargs)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/packages/urllib3/poolmanager.py", line 265, in proxy_from_url
      return ProxyManager(proxy_url=url, **kw)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/packages/urllib3/poolmanager.py", line 210, in __init__
      proxy = parse_url(proxy_url)
    File "/usr/local/lib/python3.3/site-packages/pip-6.0.8-py3.3.egg/pip/_vendor/requests/packages/urllib3/util/url.py", line 185, in parse_url
      raise LocationParseError(url)
  pip._vendor.requests.packages.urllib3.exceptions.LocationParseError: Failed to parse: user:pass

AFAICT the problem lies in function parse_url() in url.py because it assumes that there cannot exist neither a ? nor a # between the :// and the next / .  This does not hold, because a URL can include a username and a password right there, as in http://user:pass?word@host/path. Here's the offending piece of code:

    if '://' in url:
        scheme, url = url.split('://', 1)

    # Find the earliest Authority Terminator
    # (http://tools.ietf.org/html/rfc3986#section-3.2)
    url, path_, delim = split_first(url, ['/', '?', '#'])


It's funny that this snippet violates precisely the specification given in that comment (RFC3986 section 3.2), because it clearly states that this string can contain a userinfo field:

     authority   = [ userinfo "@" ] host [ ":" port ]

For some reason, urlencoding the password did not help either, the error message did not change.
msg236556 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-24 23:52
Sounds like this might be in a third-party module, not in Python itself. But see also Issue 23328 and Issue 18140.

The RFC you referenced also says this, which suggests the authority cannot contain a literal question mark:

‘The authority component . . . is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.’

Some more definitions from that RFC indicating a literal question mark is not allowed in “userinfo”:

userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
msg236558 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-02-25 01:25
> Sounds like this might be in a third-party module

+1. urllib3.url_parse doesn't make use of the standard library.

> userinfo = *( unreserved / pct-encoded / sub-delims / ":" )

This leads me to believe that using something like this might work:

from urllib.parse import quote
userinfo = '{}:{}'.format(quote(user), quote(password))

That said, there's also another relevant block that should be of note in the RFC:

   Use of the format "user:password" in the userinfo field is
   deprecated.  Applications should not render as clear text any data
   after the first colon (":") character found within a userinfo
   subcomponent unless the data after the colon is the empty string
   (indicating no password).  Applications may choose to ignore or
   reject such data when it is received as part of a reference and
   should reject the storage of such data in unencrypted form.  The
   passing of authentication information in clear text has proven to be
   a security risk in almost every case where it has been used.

In any event, this issue should be closed as it's not related to the standard library.
msg236562 - (view) Author: Leonardo Tancredi (leotan) Date: 2015-02-25 04:09
OK, firstly you'll have to excuse me, a mere sporadic Python user, for not having a clear idea about how Python development is structured. I can't tell how to label this bug report because I don't know where pip comes from: as far as I knew this is a bug in something called urllib3, which seemed to me that was related to Python itself, or maybe in the way pip is calling it, and I assumed pip was part of the Python project too. I couldn't really tell what it is that you call Extension Modules, or whether this urllib3 thing is part of what you call the "Library (Lib)" component. I didn't want to have to research in depth how this project is developed just to report what seems to be a glaring bug, at least from a user's viewpoint. I could've just dropped this thing but I thought Python would be better served by a bug report, however misdirected. I hope that you can relabel this bug accordingly to its relevant component so that this bug gets attention from the relevant team. Thank you for mentioning issue 23328, which did not come up in a search I did previous to filing my report: it does seem quite related. And that issue is marked as "Library (Lib)" and nobody complained about it so maybe this issue should be marked like that too. Please somebody who knows what's the best way to move this forward relabel it as necessary.

Now, I understand the argument for not allowing unencoded passwords, but as I mentioned in the last line of my report, I did also try urlencoding it and got the exact same error message. Should pip not allow the use of authenticated proxies at all?  I don't think that would be the best option.
msg236564 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-02-25 06:48
> I can't tell how to label this bug report because I don't know where pip comes from: as far as I knew this is a bug in something called urllib3, which seemed to me that was related to Python itself, or maybe in the way pip is calling it, and I assumed pip was part of the Python project too.

What you’re seeing in your stack trace is an issue with packages developed independently of the Python standard library. pip (https://github.com/pypa/pip), requests (https://github.com/kennethreitz/requests) and urllib3 (https://github.com/shazow/urllib3) are all maintained externally and hosted on PyPI (https://pypi.python.org/pypi). Issues here are for bugs, enhancements and such for the Python standard library. Each one of the aforementioned projects (with the caveat of pip in Python 3.4+) maintain their own sets of issues on Github.

> Thank you for mentioning issue 23328, which did not come up in a search I did previous to filing my report: it does seem quite related.
> And that issue is marked as "Library (Lib)" and nobody complained about it so maybe this issue should be marked like that too.

Indeed it does seem related and it’s possible that urllib3 took parts of the code from urllib, which /is/ part of the standard library and is what 23328 was reported against, hence the “Lib” tag for that one.

> Please somebody who knows what's the best way to move this forward relabel it as necessary.

I’ll add a few people who may be interested in this issue, but I imagine that the issue would likely be moved to the urllib3 project.

Hope that all makes sense.
msg236626 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-02-25 22:17
FWIW, setting up a local authenticated (ncsa_auth) squid proxy, this breaks using pip 0.6.8:

pip --proxy http://special:my?password@localhost:3128 install <package>

While the percent-encoded version is successful:

pip --proxy http://special:my%3Fpassword@localhost:3128 install <package>


It's odd that you would encounter the same error with an encoded password. It might be helpful if you could supply an example of the full proxy URL you're experiencing the problem with when using an encoded password.

It seems to me that it is functioning as expected based on the RFC, but could definitely use some better detection and error reporting around malformed URLs (as Senthil mentions in #23328). I'm setting the status of this issue to pending (assuming it will be closed as a fix for this would be done outside of the standard library) until someone with more expertise with pip takes a look.
msg264066 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-23 18:23
This was reported on urllib3 issue tracker: https://github.com/shazow/urllib3/issues/814
History
Date User Action Args
2022-04-11 14:58:13adminsetgithub: 67704
2016-04-23 18:23:43berker.peksagsetstatus: open -> closed

nosy: + berker.peksag
messages: + msg264066

resolution: third party
stage: resolved
2015-02-25 23:18:17demian.brechtsetstatus: pending -> open
title: pip: urllib3 does not encode userinfo section of requests: parse_url() mishandles special characters when the URL specifies authentication credentials -> pip: urllib3 does not encode userinfo section of URL with authentication credentials
2015-02-25 22:17:32demian.brechtsetstatus: open -> pending

messages: + msg236626
components: + Demos and Tools, - Extension Modules
title: requests: parse_url() mishandles special characters when the URL specifies authentication credentials -> pip: urllib3 does not encode userinfo section of requests: parse_url() mishandles special characters when the URL specifies authentication credentials
2015-02-25 06:49:15demian.brechtsetnosy: + paul.moore, ncoghlan, dstufft, Marcus.Smith
2015-02-25 06:48:52demian.brechtsetmessages: + msg236564
2015-02-25 04:09:59leotansetmessages: + msg236562
2015-02-25 01:25:08demian.brechtsetmessages: + msg236558
2015-02-25 01:15:40demian.brechtsetnosy: + demian.brecht
2015-02-24 23:52:42martin.pantersetnosy: + martin.panter
messages: + msg236556
2015-02-24 21:58:00leotancreate