Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib2 cannot handle https with proxy requiring auth #51540

Open
tsujikawa mannequin opened this issue Nov 9, 2009 · 24 comments
Open

urllib2 cannot handle https with proxy requiring auth #51540

tsujikawa mannequin opened this issue Nov 9, 2009 · 24 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir

Comments

@tsujikawa
Copy link
Mannequin

tsujikawa mannequin commented Nov 9, 2009

BPO 7291
Nosy @ronaldoussoren, @orsenthil, @vadmium, @yan12125, @OneMoreZanuda
Files
  • https_proxy_auth.patch: patch to send Proxy-Authorization header in CONNECT method(https proxy)
  • urllib2_with_proxy_auth_comparison.py
  • 2_7_x.patch: 2.7 maintenance branch patch
  • monkey_2_6_4.py: 2.6.4 monkey patch
  • urllib2_tests.tar.gz: Test code, results and instructions
  • http_proxy_https.patch: Fix handling of 407 and 401 in urllib2 and httplib
  • new_http_proxy_patch.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = None
    created_at = <Date 2009-11-09.04:38:16.601>
    labels = ['3.8', '3.7', 'library', '3.9', '3.10']
    title = 'urllib2 cannot handle https with proxy requiring auth'
    updated_at = <Date 2020-08-02.19:06:17.855>
    user = 'https://bugs.python.org/tsujikawa'

    bugs.python.org fields:

    activity = <Date 2020-08-02.19:06:17.855>
    actor = 'alexey.namyotkin'
    assignee = 'orsenthil'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-11-09.04:38:16.601>
    creator = 'tsujikawa'
    dependencies = []
    files = ['15296', '15669', '20798', '20799', '20822', '20824', '34174']
    hgrepos = []
    issue_num = 7291
    keywords = ['patch']
    message_count = 24.0
    messages = ['95058', '95060', '95373', '95377', '96659', '96660', '96661', '96844', '96845', '96846', '100840', '103090', '128861', '128862', '128952', '128953', '128955', '211850', '211860', '228406', '244513', '247805', '283306', '374687']
    nosy_count = 11.0
    nosy_names = ['ronaldoussoren', 'orsenthil', 'mbeachy', 'dieresys', 'tsujikawa', 'b.a.scott', 'martin.panter', 'vzafzal', 'yan12125', 'alexey.namyotkin', 'Jessica Ridgley']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue7291'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @tsujikawa
    Copy link
    Mannequin Author

    tsujikawa mannequin commented Nov 9, 2009

    urllib2 cannot handle https with proxy requiring authorization.

    After https_proxy is set correctly,

    Python 2.6.4 (r264:75706, Oct 29 2009, 15:38:25)
    [GCC 4.4.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import urllib2
    >>> c=urllib2.urlopen("https://sourceforge.net")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
        return _opener.open(url, data, timeout)
      File "/usr/lib/python2.6/urllib2.py", line 389, in open
        response = self._open(req, data)
      File "/usr/lib/python2.6/urllib2.py", line 407, in _open
        '_open', req)
      File "/usr/lib/python2.6/urllib2.py", line 367, in _call_chain
        result = func(*args)
      File "/usr/lib/python2.6/urllib2.py", line 1154, in https_open
        return self.do_open(httplib.HTTPSConnection, req)
      File "/usr/lib/python2.6/urllib2.py", line 1121, in do_open
        raise URLError(err)
    urllib2.URLError: <urlopen error Tunnel connection failed: 407 Proxy
    Authentication Required>

    This is because HTTPConnection::_tunnel() in httplib.py doesn't send
    Proxy-Authorization header.

    @tsujikawa tsujikawa mannequin added the stdlib Python modules in the Lib dir label Nov 9, 2009
    @tsujikawa
    Copy link
    Mannequin Author

    tsujikawa mannequin commented Nov 9, 2009

    I created a patch.
    I added additional argument 'headers' to HTTPConnection::set_tunnel()
    method,
    which is a mapping of HTTP headers to sent with CONNECT method. Since
    authorization
    credential is already set to Request object, in
    AbstractHTTPHandler::do_open(),
    if "Proxy-Authorization" header is found, pass it to set_tunnel().

    It works fine for me.

    @orsenthil orsenthil self-assigned this Nov 15, 2009
    @ronaldoussoren
    Copy link
    Contributor

    The patch looks good to me.

    IMHO this should be backported to 2.6 as well.

    @ronaldoussoren
    Copy link
    Contributor

    I've tested a backport of the patch to 2.6 (just replace set_proxy by
    _set_proxy in the patch) and the resulting version of urllib2 can login to
    the proxy (as expected).

    Thanks for the patch.

    @orsenthil
    Copy link
    Member

    Fixed and Committed revision 76908 in the trunk.

    @orsenthil
    Copy link
    Member

    Fixed through reversions r76908, r76909, r76910, r76911

    Thanks for the patch, Tatsuhiro Tsujikawa.

    @orsenthil
    Copy link
    Member

    meant revisions.

    @dieresys
    Copy link
    Mannequin

    dieresys mannequin commented Dec 23, 2009

    Hi! 2.6 backport is missing an argument in _set_tunnel definition. It
    should be:

        def _set_tunnel(self, host, port=None, headers=None):

    @dieresys
    Copy link
    Mannequin

    dieresys mannequin commented Dec 23, 2009

    The patch fixes only when you pass the authentication info in the proxy
    handler's URL. Like:

        proxy_handler = urllib2.ProxyHandler({'https':
    'http://user:pass@proxy-example.com:3128/'})

    But setting the authentication using a ProxyBasicAuthHandler is still
    broken:

        proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
        proxy_auth_handler.add_password('realm', 'proxy-example.com:3128',
    'user', 'pass')

    In the attached file (urllib2_with_proxy_auth_comparison.py) we've wrote
    a comparison between what works with HTTP and HTTPS.

    The problem is the 407 error never reaches the ProxyBasicAuthHandler
    because HTTPConnection._tunnel raises an exception when the http
    response status code is not 200.

    @orsenthil
    Copy link
    Member

    Thanks for the note, Manuel. Fixed it in revision 77013.

    @orsenthil
    Copy link
    Member

    In this ticket, setting the authentication using a ProxyBasicAuthHandler is not yet addressed yet. (this was informed in the last note). Reopening this one to track it.

    @orsenthil orsenthil reopened this Mar 11, 2010
    @mbeachy
    Copy link
    Mannequin

    mbeachy mannequin commented Apr 13, 2010

    I have worked up a monkey patch for urllib2/httplib for the issue of setting the authentication using a Proxy(Basic|Digest)AuthHandler.

    The basic approach was to create a new httplib exception (ProxyTunnelError) and raise that with the http response attached so that the HTTPSHandler can determine when 407 Proxy authentication required is present, and then reroute the urllib2.OpenerDirector to error handling mode.

    Unfortunately, there is a backwards compatibility issue - cases where a socket.error was previously being raised now get an ProxyTunnelError. Not that you could do much useful with the socket.error in the first place, but I suppose you could look for '407' in the text. Ugh.

    If you think this might prove useful, let me know and I can rework it into a real patch - just let me know what branch/version to base it off of. (My monkey patch is for python 2.6.4.)

    @mbeachy
    Copy link
    Mannequin

    mbeachy mannequin commented Feb 19, 2011

    I've been in contact w/ Barry Scott offline re: the monkey patch previously mentioned. I'm attaching a 2.7 maintenance branch patch that he has needed to extend, and plans to follow up on.

    @mbeachy
    Copy link
    Mannequin

    mbeachy mannequin commented Feb 19, 2011

    Attached to this comment (can you attach multiple files at once?) is the somewhat moldy 2.6.4 monkey patch, mercilessly ripped from my own code and probably not good for much.

    @bascott
    Copy link
    Mannequin

    bascott mannequin commented Feb 21, 2011

    The attached patch builds on Mike's work.

    The core of the problem is that the Request object
    did not know what was going on. This means that it
    was not possible for get_authorization() to work
    for proxy-auth and www-auth.

    I change Request to know which of the four types of
    connection it represents. There are new methods on
    Request that return the right information based on
    the connection type.

    To understand how to make this work I needed to
    instrument the code. There is now a set_debuglevel
    on the OpenerDirector object that turns on debug in
    all the handlers and the director. I have added
    more debug messages to help understand this code.

    This code now passes the 72 test cases I run. I'll
    attach the code I used to test as a follow up to this.

    @bascott
    Copy link
    Mannequin

    bascott mannequin commented Feb 21, 2011

    Attached is the code I used to test these changes.
    See the README.txt file for details include
    the results of a test run.

    @bascott
    Copy link
    Mannequin

    bascott mannequin commented Feb 21, 2011

    I left out some white space changes to match the style
    of the std lib code. Re posting with white space cleanup.

    @vzafzal
    Copy link
    Mannequin

    vzafzal mannequin commented Feb 21, 2014

    I've found that for the Python2.6.x patch to play nicely with the popular rquests library, I've had to set some defaults on the modified __init__ function so that it reads as follows:

        def __init__(self, *args, **kwargs):
            _orig_init(self, *args, **kwargs)
            self._tunnel_headers = {}
            self._tunnel_host = ''
            self._tunnel_port = ''

    Also seems to work with python 2.6.1. Note: Change the entry condition to:

    if os.environ.get('https_proxy', None) and sys.version_info[:2]  == (2, 6) :

    @vzafzal
    Copy link
    Mannequin

    vzafzal mannequin commented Feb 21, 2014

    Also needed to make another minor update to the monkey patch.
    Have uploaded the new files as new_http_proxy_patch.py for use with python 2.6.x

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Oct 3, 2014

    Is there any work still needed here? Surely the 2.6.x patches can't be applied unless there are security issues?

    @vadmium
    Copy link
    Member

    vadmium commented May 31, 2015

    I believe this also affects Python 3; see bpo-24333.

    I think making the CONNECT response object available to the caller is the right general approach. But I really dislike raising an exception that holds a socket connection to be closed. (I know this is already done with urllib.error.HTTPError; let’s not repeat this in the “http.client” module!)

    Ideally, at the “http.client” level I would prefer to avoid all the special set_tunnel() calls, and use the usual request() and getresponse() API to make the CONNECT request. This way the response status and header fields would be available just like any other response. For this approach to work, we would probably need to add a new HTTPConnection.detach() method that released the original socket reader and writer, and add a way to create a new HTTPConnection instance using these socket objects. This enhancement probably wouldn’t be appropriate for Python 2 or a bug fix release. But it seems the cleanest approach to me, and may also allow using HTTPConnection with the Upgrade header (e.g. for opportunistic encryption, HTTP 2, Web etc), and proxying non-HTTP connections, as bonuses.

    A less revolutionary approach might be to add a HTTPSConnection.tunnel() method, that always returns the proxy’s response, but only does the SSL wrapping for a successful response.

    @vadmium
    Copy link
    Member

    vadmium commented Aug 1, 2015

    For the record, a while ago I think I made a patch implementing my HTTPConnection.detach() proposal. I can probably dig it up if anyone is interested.

    However I gave up on fixing this bug in “urllib.request”. As far as I understand it, the framework does not distinguish the 407 Proxy Authentication Required error of the initial proxy CONNECT request from any potential 407 response from a tunnelled connection. Perhaps a special case could be made; I think there are already lots of special cases. But the maze of urlopen() handlers is already too complicated and I decided this was too hard to bother working on. Sorry :)

    @yan12125
    Copy link
    Mannequin

    yan12125 mannequin commented Dec 15, 2016

    Modify target versions to bugfix and feature branches

    @yan12125 yan12125 mannequin added the 3.7 (EOL) end of life label Dec 15, 2016
    @JessicaRidgley JessicaRidgley mannequin added 3.8 only security fixes 3.9 only security fixes labels Jul 29, 2020
    @JessicaRidgley JessicaRidgley mannequin added the 3.10 only security fixes label Jul 29, 2020
    @OneMoreZanuda
    Copy link
    Mannequin

    OneMoreZanuda mannequin commented Aug 2, 2020

    It has been 5 years, now the urllib3 is actively used, but it also inherited this problem: if no authentication data has been received, then the method _tunnel raises an exception OSError that does not contain response headers. Accordingly, this exception cannot be handled. And perhaps this is an obstacle to building a convenient system of authentication on a proxy server in a widely used library requests (it would be nice to be able to just provide an argument proxy_auth, similar to how it is done for server authorization). Now, if a user wants to send a https request through a proxy that requires complex authentication (Kerberos, NTLM, Digest, other) using the urllib3, he must first send a separate request to the proxy, receive a response, extract the necessary data to form the header Proxy-Authorization, then generate this header and pass it to the ProxyManager. And if we are talking about Requests, then the situation there is worse, because you cannot pass proxy headers directly (psf/requests#2708).
    If we were to aim to simplify the authentication procedure on the proxy server for the user, then where would we start, do we need to change the http.client so that the error returned by the method _tunnel contains headers? Or maybe it's not worth changing anything at all and the path with preliminary preparation by user of the header Proxy-Authorization is the only correct one? Martin Panter, could you also give your opinion? Thank you in advance.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants