classification
Title: urllib2 fails for proxy credentials that contain a '/' character
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Andy.Reitz, demian.brecht, martin.panter, orsenthil, takis
Priority: normal Keywords:

Created on 2015-01-27 07:03 by Andy.Reitz, last changed 2015-03-07 02:36 by demian.brecht.

Messages (11)
msg234809 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-01-27 07:03
On Python 2.7.9, if I set an https_proxy environment variable, where the password contains a '/' character, urllib2 fails. Given this test code:

  import os, urllib
  os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
  f = urllib.urlopen('http://www.python.org')
  data = f.read()
  print data

I expect this error message (because my sample proxy is totally bogus):

[areitz@SOMEHOST ~]$ python2.7 test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    f = urllib.urlopen('http://www.python.org')
  File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/usr/lib64/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/lib64/python2.7/urllib.py", line 350, in open_http
    h.endheaders(data)
  File "/usr/lib64/python2.7/httplib.py", line 997, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 850, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 812, in send
    self.connect()
  File "/usr/lib64/python2.7/httplib.py", line 793, in connect
    self.timeout, self.source_address)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
IOError: [Errno socket error] [Errno 101] Network is unreachable

Instead, I receive this error:

[areitz@SOMEHOST ~]$ python2.7 test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    f = urllib.urlopen('http://www.python.org')
  File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/usr/lib64/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/lib64/python2.7/urllib.py", line 339, in open_http
    h = httplib.HTTP(host)
  File "/usr/lib64/python2.7/httplib.py", line 1107, in __init__
    self._setup(self._connection_class(host, port, strict))
  File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'a'

Note that from the error, it seems as if urllib2 is incorrectly parsing the password from the proxy URL. When trying this with curl 7.19.7, I see the proper behavior (the correct password is parsed from the proxy URL).
msg234810 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-01-27 07:28
Sorry, went a bit too quickly -- here is the sample code that I meant to use:

  import os, urllib2
  os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
  f = urllib2.urlopen('http://www.python.org')
  data = f.read()
  print data

And the stack trace that I receive:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    f = urllib2.urlopen('http://www.python.org')
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1166, in do_open
    h = http_class(host, timeout=req.timeout, **http_conn_args)
  File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'a'

It actually looks the same -- so I suppose this issue affects both urllib and urllib2.
msg234823 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-01-27 15:22
Yup, can confirm that this is problem. As Andy recognized, there is parsing error that fails on '/' character in the password.

The environ based proxies are used by urllib rather than urllib2. (The test case if relies on environ proxy, should use urllib.urlopen()), but the failure is coming from parsing done in httplib, so it affects both urllib and urllib2.
msg235587 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 04:38
Related: Issue 18140. The slash character is meant to be a reserved character in URLs, so why hasn’t it been encoded? Where does the environment variable come from?
msg235588 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-02-09 05:10
The proxy credentials are supplied by our sysadmin. My understanding is that the http_proxy env variable doesn't require URI encoding. In addition, the same credentials work fine with curl.
msg235590 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 05:44
The relevant code looks like it is _parse_proxy() at Lib/urllib/request.py:693. It has custom code to search for a slash (/), so it wouldn’t be hard to make it search after the last at (@) symbol. (I previously assumed it would use urlsplit() or similar, which would be harder to adjust.)

Even Curl seems to require an @ symbol in the username or password to be encoded, i.e. the following doesn’t work, so you still need to encode the fields in general to work with Curl.

http_proxy=http://a@x:b@localhost curl . . .
http_proxy=http://a:b@x@localhost curl . . .
msg235628 - (view) Author: Panagiotis Issaris (takis) Date: 2015-02-09 20:19
RFC3986 seems to state that a '/' character should be encoded:

"""...
reserved    = gen-delims / sub-delims
gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
...
The user information, if present, is followed by a
commercial at-sign ("@") that delimits it from the host.
userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
"""
msg235631 - (view) Author: Andy Reitz (Andy.Reitz) Date: 2015-02-09 21:04
Sure, but the question is who should do the encoding -- the user, or python? I think it would be better for python to read the password from the environment variable, and encode it before using it. I think this is what users expect.
msg235649 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 00:16
To comply with the RFC on URLs, whoever is setting the environment variable _should_ do the encoding, and then Python will _decode_ it. But I suspect this case is more about how Python should handle an environment variable that hasn’t been encoded correctly.
msg235666 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-02-10 05:16
In the initial report, I thought, it was mentioned that curl reads the same http_proxy variable properly.  It will be good to have a correct curl test case to ascertain that. 

But, at all the places, where @ character is allowed in urls (netrc, git configs, I see that @ should be encoded). In that case, this bug report is more towards detecting bad urls and presenting a better error message.
msg235673 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-10 06:58
This should demonstrate that Curl does parse literal slashes in the username and password fields:

$ http_proxy=http://user/name:pass/word@localhost:22 curl -v http://example.net/
*   Trying ::1...
* Connected to localhost (::1) port 22 (#0)
* Proxy auth using Basic with user 'user/name'
> GET http://example.net/ HTTP/1.1
> Proxy-Authorization: Basic dXNlci9uYW1lOnBhc3Mvd29yZA==
> User-Agent: curl/7.40.0
> Host: example.net
> Accept: */*
> Connection: TE
> TE: gzip
> Proxy-Connection: Keep-Alive
> 
SSH-2.0-OpenSSH_6.2
Protocol mismatch.
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
[Exit 56]
$ base64 -d <<< dXNlci9uYW1lOnBhc3Mvd29yZA==
user/name:pass/word$
History
Date User Action Args
2015-03-07 02:36:51demian.brechtsetnosy: + demian.brecht
2015-02-10 06:58:07martin.pantersetmessages: + msg235673
2015-02-10 05:16:42orsenthilsetmessages: + msg235666
2015-02-10 00:16:25martin.pantersetmessages: + msg235649
2015-02-09 21:04:13Andy.Reitzsetmessages: + msg235631
2015-02-09 20:19:54takissetnosy: + takis
messages: + msg235628
2015-02-09 05:44:16martin.pantersetmessages: + msg235590
2015-02-09 05:10:46Andy.Reitzsetmessages: + msg235588
2015-02-09 04:38:52martin.pantersetnosy: + martin.panter
messages: + msg235587
2015-01-27 15:22:45orsenthilsetnosy: + orsenthil
messages: + msg234823

assignee: orsenthil
stage: needs patch
2015-01-27 07:28:54Andy.Reitzsetmessages: + msg234810
2015-01-27 07:03:55Andy.Reitzcreate