Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request fails for proxy credentials that contain a '/' character #67517

Closed
AndyReitz mannequin opened this issue Jan 27, 2015 · 13 comments
Closed

urllib.request fails for proxy credentials that contain a '/' character #67517

AndyReitz mannequin opened this issue Jan 27, 2015 · 13 comments
Assignees
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@AndyReitz
Copy link
Mannequin

AndyReitz mannequin commented Jan 27, 2015

BPO 23328
Nosy @orsenthil, @vadmium, @demianbrecht, @miss-islington
PRs
  • bpo-23328 Allow / character in username,password fields in _PROXY envvars. #23973
  • [3.8] bpo-23328 Allow / character in username,password fields in _PROXY envvars. #23992
  • [3.9] bpo-23328 Allow / character in username,password fields in _PROXY envvars. (GH-23973) #23993
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2020-12-29.13:17:51.986>
    created_at = <Date 2015-01-27.07:03:55.054>
    labels = ['3.8', 'type-bug', 'library', '3.9', '3.10']
    title = "urllib.request fails for proxy credentials that contain a '/' character"
    updated_at = <Date 2020-12-29.13:17:51.986>
    user = 'https://bugs.python.org/AndyReitz'

    bugs.python.org fields:

    activity = <Date 2020-12-29.13:17:51.986>
    actor = 'orsenthil'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2020-12-29.13:17:51.986>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2015-01-27.07:03:55.054>
    creator = 'Andy.Reitz'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 23328
    keywords = ['patch']
    message_count = 13.0
    messages = ['234809', '234810', '234823', '235587', '235588', '235590', '235628', '235631', '235649', '235666', '235673', '383881', '383998']
    nosy_count = 6.0
    nosy_names = ['orsenthil', 'martin.panter', 'demian.brecht', 'Andy.Reitz', 'takis', 'miss-islington']
    pr_nums = ['23973', '23992', '23993']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23328'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @AndyReitz
    Copy link
    Mannequin Author

    AndyReitz mannequin commented Jan 27, 2015

    On Python 2.7.9, if I set an https_proxy environment variable, where the password contains a '/' character, urllib2 fails. Given this test code:

      import os, urllib
      os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
      f = urllib.urlopen('http://www.python.org')
      data = f.read()
      print data

    I expect this error message (because my sample proxy is totally bogus):

    [areitz@SOMEHOST ~]$ python2.7 test.py
    Traceback (most recent call last):
      File "test.py", line 3, in <module>
        f = urllib.urlopen('http://www.python.org')
      File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
        return opener.open(url)
      File "/usr/lib64/python2.7/urllib.py", line 213, in open
        return getattr(self, name)(url)
      File "/usr/lib64/python2.7/urllib.py", line 350, in open_http
        h.endheaders(data)
      File "/usr/lib64/python2.7/httplib.py", line 997, in endheaders
        self._send_output(message_body)
      File "/usr/lib64/python2.7/httplib.py", line 850, in _send_output
        self.send(msg)
      File "/usr/lib64/python2.7/httplib.py", line 812, in send
        self.connect()
      File "/usr/lib64/python2.7/httplib.py", line 793, in connect
        self.timeout, self.source_address)
      File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
        raise err
    IOError: [Errno socket error] [Errno 101] Network is unreachable

    Instead, I receive this error:

    [areitz@SOMEHOST ~]$ python2.7 test.py
    Traceback (most recent call last):
      File "test.py", line 3, in <module>
        f = urllib.urlopen('http://www.python.org')
      File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
        return opener.open(url)
      File "/usr/lib64/python2.7/urllib.py", line 213, in open
        return getattr(self, name)(url)
      File "/usr/lib64/python2.7/urllib.py", line 339, in open_http
        h = httplib.HTTP(host)
      File "/usr/lib64/python2.7/httplib.py", line 1107, in __init__
        self._setup(self._connection_class(host, port, strict))
      File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
        (self.host, self.port) = self._get_hostport(host, port)
      File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
        raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
    httplib.InvalidURL: nonnumeric port: 'a'

    Note that from the error, it seems as if urllib2 is incorrectly parsing the password from the proxy URL. When trying this with curl 7.19.7, I see the proper behavior (the correct password is parsed from the proxy URL).

    @AndyReitz AndyReitz mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 27, 2015
    @AndyReitz
    Copy link
    Mannequin Author

    AndyReitz mannequin commented Jan 27, 2015

    Sorry, went a bit too quickly -- here is the sample code that I meant to use:

      import os, urllib2
      os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
      f = urllib2.urlopen('http://www.python.org')
      data = f.read()
      print data

    And the stack trace that I receive:

    Traceback (most recent call last):
      File "test.py", line 3, in <module>
        f = urllib2.urlopen('http://www.python.org')
      File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/lib64/python2.7/urllib2.py", line 431, in open
        response = self._open(req, data)
      File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
        '_open', req)
      File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
        result = func(*args)
      File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib64/python2.7/urllib2.py", line 1166, in do_open
        h = http_class(host, timeout=req.timeout, **http_conn_args)
      File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
        (self.host, self.port) = self._get_hostport(host, port)
      File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
        raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
    httplib.InvalidURL: nonnumeric port: 'a'

    It actually looks the same -- so I suppose this issue affects both urllib and urllib2.

    @orsenthil
    Copy link
    Member

    Yup, can confirm that this is problem. As Andy recognized, there is parsing error that fails on '/' character in the password.

    The environ based proxies are used by urllib rather than urllib2. (The test case if relies on environ proxy, should use urllib.urlopen()), but the failure is coming from parsing done in httplib, so it affects both urllib and urllib2.

    @orsenthil orsenthil self-assigned this Jan 27, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Feb 9, 2015

    Related: bpo-18140. The slash character is meant to be a reserved character in URLs, so why hasn’t it been encoded? Where does the environment variable come from?

    @AndyReitz
    Copy link
    Mannequin Author

    AndyReitz mannequin commented Feb 9, 2015

    The proxy credentials are supplied by our sysadmin. My understanding is that the http_proxy env variable doesn't require URI encoding. In addition, the same credentials work fine with curl.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 9, 2015

    The relevant code looks like it is _parse_proxy() at Lib/urllib/request.py:693. It has custom code to search for a slash (/), so it wouldn’t be hard to make it search after the last at (@) symbol. (I previously assumed it would use urlsplit() or similar, which would be harder to adjust.)

    Even Curl seems to require an @ symbol in the username or password to be encoded, i.e. the following doesn’t work, so you still need to encode the fields in general to work with Curl.

    http_proxy=http://a@x:b@localhost curl . . .
    http_proxy=http://a:b@x@localhost curl . . .

    @takis
    Copy link
    Mannequin

    takis mannequin commented Feb 9, 2015

    RFC3986 seems to state that a '/' character should be encoded:

    """...
    reserved = gen-delims / sub-delims
    gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
    / "*" / "+" / "," / ";" / "="
    ...
    The user information, if present, is followed by a
    commercial at-sign ("@") that delimits it from the host.
    userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
    """

    @AndyReitz
    Copy link
    Mannequin Author

    AndyReitz mannequin commented Feb 9, 2015

    Sure, but the question is who should do the encoding -- the user, or python? I think it would be better for python to read the password from the environment variable, and encode it before using it. I think this is what users expect.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 10, 2015

    To comply with the RFC on URLs, whoever is setting the environment variable _should_ do the encoding, and then Python will _decode_ it. But I suspect this case is more about how Python should handle an environment variable that hasn’t been encoded correctly.

    @orsenthil
    Copy link
    Member

    In the initial report, I thought, it was mentioned that curl reads the same http_proxy variable properly. It will be good to have a correct curl test case to ascertain that.

    But, at all the places, where @ character is allowed in urls (netrc, git configs, I see that @ should be encoded). In that case, this bug report is more towards detecting bad urls and presenting a better error message.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 10, 2015

    This should demonstrate that Curl does parse literal slashes in the username and password fields:

    $ http_proxy=http://user/name:pass/word@localhost:22 curl -v http://example.net/
    *   Trying ::1...
    * Connected to localhost (::1) port 22 (#0)
    * Proxy auth using Basic with user 'user/name'
    > GET http://example.net/ HTTP/1.1
    > Proxy-Authorization: Basic dXNlci9uYW1lOnBhc3Mvd29yZA==
    > User-Agent: curl/7.40.0
    > Host: example.net
    > Accept: */*
    > Connection: TE
    > TE: gzip
    > Proxy-Connection: Keep-Alive
    > 
    SSH-2.0-OpenSSH_6.2
    Protocol mismatch.
    * Recv failure: Connection reset by peer
    * Closing connection 0
    curl: (56) Recv failure: Connection reset by peer
    [Exit 56]
    $ base64 -d <<< dXNlci9uYW1lOnBhc3Mvd29yZA==
    user/name:pass/word$

    @orsenthil orsenthil added the 3.10 only security fixes label Dec 23, 2020
    @orsenthil orsenthil changed the title urllib2 fails for proxy credentials that contain a '/' character urllib.request fails for proxy credentials that contain a '/' character Dec 23, 2020
    @orsenthil
    Copy link
    Member

    #23973 will resolve this issue. The issue was localized to _parse_proxy method in urllib2.

    @orsenthil orsenthil added 3.8 only security fixes 3.9 only security fixes labels Dec 29, 2020
    @orsenthil
    Copy link
    Member

    Merged in

    3.10 - 030a713

    3.9 - df79440

    3.8 - 741f22d

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants