classification
Title: urllib.parse.urlopen shouldn't ignore installed opener when called with any SSL argument
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Alecz, David Ford (FirefighterBlu3), ber, martin.panter, moritzs, r.david.murray
Priority: normal Keywords: patch

Created on 2013-07-24 12:26 by moritzs, last changed 2018-01-10 09:51 by ber.

Files
File name Uploaded Description Edit
urlopen-explained.py Alecz, 2014-07-24 20:46
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015-07-02 20:29 patch is obsolete
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015-07-02 23:40 patch is obsolete
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015-07-03 04:32 patch is obsolete
ssl_context_tests.py David Ford (FirefighterBlu3), 2015-07-03 04:34
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015-07-04 17:59
Messages (12)
msg193640 - (view) Author: Moritz Sichert (moritzs) * Date: 2013-07-24 12:26
If you pass any of cafile, capath or cadefault to urllib.parse.urlopen it creates a new opener that contains the HTTPSHandler that was according to the ca* arguments. It then uses this new opener to execute the request.
If you installed a custom opener with urllib.parse.install_opener it won't be used.

urlopen should either add a HTTPSHandler to the installed opener or not accept any Handler specific arguments at all.
msg223894 - (view) Author: Alecz (Alecz) Date: 2014-07-24 20:46
I just want to point out that the documentation states that an opener can be used with urlopen:

 urllib.request.install_opener(opener)

    Install an OpenerDirector instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; 

Reference:
https://docs.python.org/3/library/urllib.request.html?highlight=urllib.request#urllib.request.install_opener

Issue 17483 was closed (rejected) because it was considered that a custom opener can be used when opening https links.
msg246096 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-02 20:29
the attached diff (for py3) fixes the (badly) broken urlopen :}

previously, if SSL args were detected, all installed handlers via _opener got wiped out. now, we check if _opener already exists and if so we just make sure the HTTPSHandler is added thus preserving installed handlers.
msg246105 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-02 23:40
Unfortunately more breakage exists within urllib.request. A context supplied to urlopen() is useless in the following pseudo code:

build_some_openers()
context = ssl.foo()
urlopen('foo.com', context=context)

<test against foo.com -- foo.com ssl setup is munged with non-verify, out of date, or something that doesn't make happy with a default context>

When urlopen() runs, it does indeed (with my earlier patch) add another HTTPSHandler(context). However, the default added HTTPSHandler is called first in the chain (see the bisect.insort) and will cause the urlopen attempt to fail if the SSL connection does not work with a default or void context.

The end-user specified context will never be reached whether they attempt to install their own HTTPSHandler or not since the default installed HTTPSHandler will raise an exception.

Therefore, I've attached another patch to urllib.request which ensures that (a) existing opener chain is not discarded and (b) a default opener chain is not made with an HTTPSHandler in it, only adding the HTTPSHandler at urlopen() time if 'https' is found in the URL.
msg246108 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-07-03 00:30
See also issue 23166.
msg246120 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-03 01:51
perhaps an HTTPSHandler() should only merged into the handler chain if an https URI is found and no existing handler is found that has an https_open() defined
msg246126 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-03 04:32
Third version of this patch and a short test suite specifically for this problem.

per awareness of :issue:`23166` I rewrote my patch to handle subclassed HTTPS handlers. There are also exceptions raised if an attempt to install more than one HTTPS handler is done. HTTPS does not play well with multiple handlers trying to process data and will immediately fail on a second attempt to negotiate a handshake on an established socket.

Additionally, if (a) an HTTPSHandler is already in the chain and (b) doesn't have an SSL context, then we create an appropriate context and attach it to the existing handler
msg246127 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-03 04:34
Test cases for SSL _opener, contexts and HTTPSHandlers in this file
msg246268 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-07-04 17:59
In my quest for completeness, I discovered a lack of handling given HTTP->HTTPS redirect. So I've attached another version of this patch which ensures an HTTPS handler is installed if such a redirect is found.
msg257133 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015-12-28 22:07
can we get a review on this please? i'd love to move forward and get the fixes in place.
msg257282 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-01 08:22
I closed Issue 18543 as a duplicate, which points out this also affects Python 2.7.9+, and can also be triggered by setting the “context” parameter.

Overall, I am not comfortable with the complication and hacks in the current patch. Maybe it would be simpler to not fix the bug, and just document that you cannot use the SSL context parameters with a custom opener. If necessary, we could add a new API to pass the context to the handler properly, as suggested by Benjamin in <https://bugs.python.org/issue23166#msg233484>, but that might have to be in 3.6+ only.

Some specific comments on the patch:

The url[:5] check should probably check for a colon as well.

What’s the problem with multiple HTTPS handlers? Is it a side effect of your patch, or did it already exist? Is it only a problem for urlopen(), or also for calling open() directly?

You can use any() instead of building a list and testing if it is empty.

I think you can handle https: URLs without defining https_open(), for instance I have a handler that defines default_open() and checks the request manually. Does your patch account for that?

The patch seems to modify the default installed opener with the supplied SSL parameters. Also I wonder if it would break any multithreaded usage that currently works.

It would be good to have test cases integrated into the existing suite.
msg309761 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2018-01-10 09:51
Yesterday I ran into the same problem and I agree that it should be solved, by either documenting the issue or fixing it.

As the code in https://github.com/python/cpython/blob/master/Lib/urllib/request.py#L199
still shows the branch and
https://docs.python.org/3.7/library/urllib.request.html
today does not hint upon it, I'm adding the Python 3.7 Version indicator.

In 3.7 (dev) docs context is indicated as preferred so the situation may be a little bit better, but still not resolved.

Maybe a fix should change OpenerDirector() to be able to replace or remove a handler.
History
Date User Action Args
2018-01-10 09:51:33bersetnosy: + ber

messages: + msg309761
versions: + Python 3.7
2017-01-27 00:21:38martin.panterlinkissue29379 superseder
2017-01-27 00:14:31martin.panterlinkissue15769 superseder
2016-01-01 08:22:55martin.pantersetnosy: + martin.panter
title: urllib.parse.urlopen shouldn't ignore installed opener when called with any ca* argument -> urllib.parse.urlopen shouldn't ignore installed opener when called with any SSL argument
messages: + msg257282

versions: + Python 2.7
stage: patch review
2016-01-01 08:14:34martin.panterlinkissue23166 superseder
2015-12-28 22:07:14David Ford (FirefighterBlu3)setmessages: + msg257133
versions: + Python 3.5, Python 3.6
2015-07-04 17:59:36David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246268
2015-07-03 04:34:23David Ford (FirefighterBlu3)setfiles: + ssl_context_tests.py

messages: + msg246127
2015-07-03 04:32:31David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246126
2015-07-03 01:51:22David Ford (FirefighterBlu3)setmessages: + msg246120
2015-07-03 00:30:22r.david.murraysetmessages: + msg246108
2015-07-02 23:40:17David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246105
2015-07-02 20:30:00David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

nosy: + David Ford (FirefighterBlu3)
messages: + msg246096

keywords: + patch
2014-07-24 20:46:47Aleczsetfiles: + urlopen-explained.py
nosy: + Alecz
messages: + msg223894

2013-07-24 15:11:36r.david.murraysetnosy: + r.david.murray
2013-07-24 12:26:58moritzscreate