New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http.client.HTTPConnection tunneling is broken #52024
Comments
I'm trying to do HTTPS via a proxy in Python 2.6.4 (which is supposed to incorporate this fix from bpo-1424152). While trying to debug this starting from the suds library I've been reading httplib.py and urllib2.py to figure out what's going wrong _tunnel() is broken because _set_hostport() has side effects. _tunnel() starts with: However, _set_hostport() sets the .host and .port attributes in The next action _tunnel() takes is to send the CONNECT HTTP command, It seems to me that _tunnel() should be grabbing the original host and port before calling _set_hostport(), thus: ohost, oport = self.host, self.port In fact the situation seems even worse: _tunnel() calls send(), send() calls connect(), and connect() calls _tunnel() in an infinite regress.
|
Amendment: regarding the infinite regress, it looks like there will not be a recursion if the caller leaps straight to the .connect() method. However, if they do that then the call to _tunnel() from within connect() will happen _after_ the socket is made directly to the origin host, not via the proxy. So the behaviour seems incorrect then also; it looks very much like _tunnel() must always be called before the real socket connection is established, and .connect() calls _tunnel() afterwards, not before. |
It's looking like I have my idea of .host versus ._tunnel_host swapped. I think things are still buggy, but my interpretation of the bug is wrong or misleading. I gather that after _set_tunnel(), .host is the proxy host and that ._tunnel_host is the original target host. I'll follow up here in a bit when I've better characterised the problem. |
Well, I've established a few things:
To the first item: _tunnel() feels really fragile with that recursion issue, though it doesn't recurse called from urllib2. For the second, here's my test script using httplib: H = httplib.HTTPSConnection("localhost", 3128)
print H
H._set_tunnel("localhost", 443)
H.request("GET", "/boguspath")
os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
R = H.getresponse()
print R.status, R.reason As you can see, one builds the HTTPSConnection object with the proxy's details instead of those of the target URL, and then put the target URL details in with _set_tunnel(). Am I alone in find this strange? For the third, my test code is this: U = urllib2.Request('https://localhost/boguspath')
U.set_proxy('localhost:3128', 'https')
f = urllib2.urlopen(R)
print f.read() which fails like this: Traceback (most recent call last):
File "thttp.py", line 15, in <module>
f = urllib2.urlopen(R)
File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 131, in urlopen
return _opener.open(url, data, timeout)
File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 395, in open
protocol = req.get_type()
AttributeError: HTTPResponse instance has no attribute 'get_type' The line numbers are slightly off because I've got some debugging statements in there. Finally, I flat out do not understand urllib2's set_proxy() method:
When my code calls set_proxy, self.type is None. Now, I had naively expected the first branch to be the only branch. Could someone explain what's happening here, and what is meant to happen? I'm thinking that this bug may turn into a doc fix instead of a behaviour fix, but I'm finding it surprisingly hard to know how urllib2 is supposed to be used. |
As you noticed, the _set_tunnel method is a private method not intended to be used directly. Its being used by urllib2 when https through proxy is required. How do think the docs can be improved? If you have any suggestions please upload a patch. |
Well, following your description I've backed out my urllib2 test case to this: f = urllib2.urlopen('https://localhost/boguspath')
os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
f = urllib2.urlopen(R)
print f.read() and it happily runs HTTPS through the proxy if I set the https_proxy envvar. So it's all well and good for the "just do what the environment suggests" use case. However, my older test: U = urllib2.Request('https://localhost/boguspath')
U.set_proxy('localhost:3128', 'https')
f = urllib2.urlopen(R)
print f.read() still blows up with: File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 381, in open Now, this is the use case for "I have a custom proxy setup for this activity". It seems a little dd that "req" above is an HTTPResponse instead of a Request, and that my be why there's no .ettype() method available. I also see nothing obviously wrong with my set_proxy() call above based on the docs for the .set_proxy() method, though obviously it fails. I think what may be needed is a small expansion of the section in the Examples are on proxies. There's an description of the use of the *_proxy envvars there (and not elsewhere, which seems wrong) and an example of providing a proxy Handler. An addition example with a functioning use of a bare .set_proxy() might help. |
Cameron could you provide a patch for this? |
Ok, this is a bit of a mess. Issue bpo-11448 contains a patch for the documentation. But the functionality itself is still not working. With the (I believe) intended usage, we have:
What happens then is this:
|
I have attached a patch that should fix the issue. |
New patch revision, this time includes unit tests. |
I've updated the patch again to fix a problem with HTTPSConnection. |
Rebased patch on current tip. |
A few comments
|
Hmm. I think I found another problem... please wait for another patch revision. |
Ok, I've attached yet another patch revision. This revision is less complex, because it gets rid of the ability to set up chains of tunnels. The only reason that I put that in was to preserve backward compatibility -- but upon reviewing the old implementation again, I found out that this actually did not work in the past either. |
Refreshed patch. |
I am reviewing this patch right now and you will see my action soon. It is completely and I am reviewing to validating the technical details/fix. Thanks for patch, Nikolaus. |
I verified the patch and this indeed corrects a nasty bug in sending a wrong header when doing it a lower level HTTPSConnection to proxy and set_tunnel (bad term) to the end host..I was worried as why we did not observe this earlier and it seems to me that the advertised way to do HTTPS CONNECT is via Proxy Handler or urllib.request and when doing it via a ProxyHandler, these wierdly named action (set_tunnel) happen underneath, but the skip_hosts bit is set as we got headers from the higher level method. and the host header is carried transparently to the tunnel connection request and thus we escaped this. The patch fixes the problem and cleans up a bit. Thanks for that , Nikolaus. This code (http/client.py) will require more attention beyond this bug too. |
New changeset 39ee3286d187 by Senthil Kumaran in branch '3.4': New changeset 2c9af09ba7b8 by Senthil Kumaran in branch 'default': |
This is fixed in 3.4 and 3.5 (finally). I will port it to 2.7 as well. |
I get errors when using pip with a proxy in 3.4.1rc1 on Windows, that does not happen on 3.4.0. C:\Python34\Scripts>set HTTP_PROXY=http://openwrt.lan:8888 C:\Python34\Scripts>set HTTPS_PROXY=http://openwrt.lan:8888 C:\Python34\Scripts>pip -v install simplejson |
On 05/09/2014 02:02 PM, Cybjit wrote:
This looks as if pip tries to match the hostname in the certificate from Is pip maybe doing its own certificate check, and relying on |
On 2014-05-10 00:23, nikratio wrote:
I think the culprit might be here https://github.com/pypa/pip/blob/1.5.4/pip/_vendor/requests/packages/urllib3/connection.py#L172 |
Cybjit <report@bugs.python.org> writes:
Yes, that's the problem. I guess that nicely demonstrates why using I guess we nevertheless need to repair/work around this in Python 3.4? Alternatively, maybe we could also do nothing, because if pip is Thinking about this, I think we should just revert the entire patch for Best, --
|
Should this be a release blocker regression for 3.4.1? |
dstufft, what do you think? |
Let me raise the issue with urllib3 and see if maybe we can get a quick turn around and just fix it for real. |
This is going to break existing versions of urllib3 (and thus requests and thus pip) when using verified TLS + a proxy, however future versions can work around it and a fix is being looked at right now. Once it's fixed there it can propagate to requests and then to pip. Urllib3 issue is here urllib3/urllib3#385 As far as what CPython should do. I personally don't think 3.4.1 should go out with this broken. That'll mean either getting a new pip out with the fix and bump the bundled version in CPython or revert this patch and wait till 3.5 (or 3.4.2 if you don't want to hold up 3.4.1). |
Just an update, the issue is fixed in urllib3 and that has been pulled into requests. Requests is currently prepping to release a new version which I'll pull into pip and issue a pip 1.5.6 release which can be pulled into CPython which should fix this. |
I am glad that issues with 3rdparty libs which dependent on the previous wrong behavior has been resolved. As indicated previously, I think, it makes sense to have this in 2.7 as well. I created a patch and tested it 2.7 and it is all good. I plan to commit it before the next 2.7 update (which should be tomorrow). |
New changeset 568041fd8090 by Senthil Kumaran in branch '2.7': |
This is fixed in 2.7 as well here (changeset 568041fd8090). We shall close this ticket after @dstufft pulls in the updated pip for 3.4 Thanks! |
Requests has been released and I've pulled it into the pip tree. I'll be releasing tonight probably, or maybe tomorrow. |
I tag 3.4.1 final in less than 24 hours. I really would prefer that the embedded pip not contain such, uh, fresh software. But let's try it and hope for the best. |
Well you're the RM Larry :) I'll do whatever you think is best. I would greatly prefer it if the pip shipped with CPython 3.4.1 wasn't broken with proxies. I think the choices are
Whichever you think is the better option is fine with me. |
Yeah, I'd like to see the diff. |
I just released pip 1.5.6. The ensurepip package currently has 1.5.4 inside of it. 1.5.5 has been out for 2 weeks or so and there haven't been any reported regressions. The only difference between 1.5.5 and 1.5.6 is that requests has been upgraded. Here's the changelog items for 1.5.5 and 1.5.6: https://github.com/pypa/pip/blob/master/CHANGES.txt#L1-L20 Here's the diff from 1.5.5 to 1.5.6: pypa/pip@1.5.5...1.5.6 If you want me to update the bundled ensurepip let me know what branch I should do it on. I'm going to go ahead and do it for 3.5 though. |
Just FYI, I upgraded setuptools and pip in 3.5: http://hg.python.org/cpython/rev/acb5cc616052 If you decide to go that way dunno if you can just cherry pick or not. |
I prefer we update the ensurepip in 3.4.1 |
Is there anything else I need to do? |
@dstufft - should you commit it in 3.4 branch (since the change is already in 3.5) and then wait for larry's approval or rejection? |
Okay, this has my blessing to be merged for 3.4.1. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: