This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 failing with squid proxy and digest authentication
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: alexwe, martin.panter, orsenthil, toobaz
Priority: normal Keywords: patch

Created on 2012-09-30 16:41 by toobaz, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
proxy-digest.patch martin.panter, 2015-06-21 07:40 review
Messages (6)
msg171650 - (view) Author: Pietro Battiston (toobaz) Date: 2012-09-30 16:41
If you run the following code:

#! /usr/bin/python
import urllib2

MyHTTPPasswordMgr = urllib2.HTTPPasswordMgr
proxy = urllib2.ProxyHandler({'http': 'http://proxybiblio2.si.unimib.it:8080'})
auth = urllib2.ProxyDigestAuthHandler(MyHTTPPasswordMgr())
auth.add_password(None, "proxybiblio2.si.unimib.it", "a", "b" )
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://webofknowledge.com')

an "HTTP Error 407: Proxy Authentication Required" is raised, and under the hood here's what's happening:
- the request is made without authentication
- the server replies it must be made with digest authentication, and gives the nonce
- the error is raised.

Instead, urllib2 should now try to connect with the username and password, and do so up to 5 times (as hardcoded in urllib2.http_error_auth_reqed), and then raise a "HTTP Error 401: digest auth failed". And it's indeed what it does if you replace the line "MyHTTPPasswordMgr = urllib2.HTTPPasswordMgr" with


class MyHTTPPasswordMgr(urllib2.HTTPPasswordMgr):
    def find_user_password(self, realm, authuri):
        return "a", "b"


So the problem is in HTTPPasswordMgr, which is apparently unable to match the authentication data with the realm. Some tests¹ suggest that this can vary according to the proxy engine and to the proxy address format (works with apache, but doesn't if then you add "http://" in front of the proxy address).

This reminds a bit bug 680577, and in particular I noticed that (possibly unrelated) the behaviour reported in the following message:
http://bugs.python.org/msg14444
has not changed:

In [1]: import urllib2

In [2]: urllib2.HTTPPasswordMgr().is_suburi("/foo/spam", "/foo/eggs")Out[2]: True


This affects also python 3.2, you can try the following:

#! /usr/bin/python
from urllib import request
MyHTTPPasswordMgr = request.HTTPPasswordMgr
proxy = request.ProxyHandler({'http': 'http://proxybiblio2.si.unimib.it:8080'})
auth = request.ProxyDigestAuthHandler(MyHTTPPasswordMgr())
auth.add_password(None, "proxybiblio2.si.unimib.it", "a", "b" )
opener = request.build_opener(proxy, auth, request.HTTPHandler)
request.install_opener(opener)
conn = request.urlopen('http://webofknowledge.com')


¹ http://lists.python.it/pipermail/python/2012-September/013309.html (in Italian)
msg222132 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-02 21:35
Can we have an update on this please.
msg222137 - (view) Author: Pietro Battiston (toobaz) Date: 2014-07-02 22:20
The bug is still present in 2.7.7 and 3.4.1.

By the way, under python 3 the workaround takes the form


class MyHTTPPasswordMgr(urllib.request.HTTPPasswordMgr):
    def find_user_password(self, realm, authuri):
        return "a", "b"

Finally, notice the wrong behaviour of "is_suburi()" mentioned in http://bugs.python.org/msg14444 is still present (and I still suspect it has something to do with this).
msg226913 - (view) Author: Alexander Weidinger (alexwe) Date: 2014-09-15 13:00
So, I analyzed the error and I think I found the problem. (urllib.request - Python 3.5.0)

It all starts with l. 1079, as the 407 error gets handled, everything ok here, in l. 1081 http_error_auth_reqed(...) gets executed.

So next, we are in l. 939, also everything correct here, retry_http_digest_auth gets executed in l. 953. (woops, "http_digest_auth"?!, nah, let's see what comes next)

So we are in l. 953 and follow the code to come to the get_authorization(...) call.

Now we are in l. 981, and in that part of the code lies the problem.
To get the username and password for the proxy, the function find_user_password(realm, req.full_url) gets executed.

An example, if my proxy has the address abc.com:8080 and my request is for xyz.com, the function tries to find a password for the xyz.com url, instead of the abc.com:8080 url. So it can't find a password and the whole auth process stops with the 407 error.

But if you just change the line, to use the host, normal http digest auth doesn't work anymore, I would suggest?

So it's also obvious why the workaround of toobaz works.

--------------------------------------------------------

To solve the Problem, two auth handler would be needed, one for the proxy and one for normal http auth.
Two different handlers were used in the basic auth, so I think it would be an consistent solution?
msg245582 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-21 07:40
There are two problems with the test case, and one bug in Python:

1. HTTPPasswordMgr doesn’t handle realm=None; it has to be a string. You can use HTTPPasswordMgrWithDefaultRealm though.

2. The password managers won’t match a proxy with a non-standard port number against a hostname without a port. So you have to include the port in the add_password() call.

3. AbstractDigestAuthHandler.get_authorization() is using the wrong URL, as Alexander already discovered. I made proxy-digest.patch which should fix this.

The wrong URL (final URL rather than proxy) was actually tested for in test_urllib2_localnet.ProxyAuthTests. So I fixed those tests.
msg245583 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-21 07:47
I should point out my patch also adds add_password() methods for the authentication handlers to the documentation. These were only documented by example, but everyone seems to prefer using them rather than the equivalent password manager method.
History
Date User Action Args
2022-04-11 14:57:36adminsetgithub: 60299
2019-03-15 21:58:58BreamoreBoysetnosy: - BreamoreBoy
2015-06-21 07:47:23martin.pantersetmessages: + msg245583
2015-06-21 07:40:24martin.pantersetfiles: + proxy-digest.patch

versions: + Python 3.6
keywords: + patch
nosy: + martin.panter

messages: + msg245582
stage: patch review
2014-09-15 13:00:33alexwesetnosy: + alexwe
messages: + msg226913
2014-07-02 22:20:09toobazsetmessages: + msg222137
2014-07-02 21:35:15BreamoreBoysetnosy: + BreamoreBoy

messages: + msg222132
versions: + Python 3.4, Python 3.5, - Python 3.2
2012-09-30 17:08:42orsenthilsetassignee: orsenthil

nosy: + orsenthil
2012-09-30 16:41:50toobazsetversions: + Python 3.2
2012-09-30 16:41:31toobazcreate