classification
Title: Prioritize lowercase proxy variables in urllib.request
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: frispete, martin.panter, orsenthil, python-dev
Priority: normal Keywords: patch

Created on 2016-04-19 08:18 by frispete, last changed 2016-06-15 03:01 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
prioritize_lowercase_proxy_vars_in_urllib_request.diff frispete, 2016-04-19 08:18 Prioritize lowercase proxy variables in urllib.request review
python-urllib-prefer-lowercase-proxies.diff frispete, 2016-04-20 19:13 review
python-urllib-prefer-lowercase-proxies-v3.diff frispete, 2016-04-21 08:53 review
python-urllib-prefer-lowercase-proxies-v4.diff frispete, 2016-04-21 11:28 review
python-urllib-prefer-lowercase-proxies-v5.diff frispete, 2016-04-22 07:00 review
python-urllib-prefer-lowercase-proxies-v6.diff frispete, 2016-04-24 16:40 review
python-urllib-prefer-lowercase-proxies-v7.diff frispete, 2016-04-25 08:41 review
Messages (21)
msg263720 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-19 08:18
During programming a function, that replaces a wget call, I noticed, that something is wrong with urllibs proxy handling. 

I usually use the scheme "http_proxy= wget -N -nd URL" when I need to bypass the proxy. Hence I was pretty confused, that this doesn't work with python(3). Creating an empty ProxyHandler isn't the real Mc Coy either. Diving into the issue, I found getproxies_environment, but couldn't make much sense out of its behavior, up until I noticed, that
my OS (openSUSE ) creates both variants of environment variables behind the scenes: uppercase and lowercase. 

Consequence: python3 needs the scheme "http_proxy= HTTP_PROXY= python3 ..."

Since I, like everyone else, prefer gentle tones over the loud, and want to spare this surprise for others in the future, I propose the attached patch.

Process environment variables in two passes, first uppercase, then lowercase, allowing an empty lowercase value to overrule any uppercase value.

Please consider applying this.
msg263791 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-20 02:46
I think this should be applied as a bug fix to 2.7 and 3.5 as well. What do you think? Lowercase is the normal way to use these variables.

I left some comments on the code review.

A similar bug seems to exist for the “no_proxy” variable. Also, it would be nice to add a test case for this.
msg263796 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-04-20 05:34
Hi Hans-Peter,

I agree with Martin's comments and suggestion.

If I understand the suggestion correctly, the only change will be a documentation change. Isn't it? 

Because getproxies_environment() in it's current form already fetches the lower_case proxy which you want to highlight that is a winner in case there are two environment variables with different cases.

Thanks!
msg263801 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-20 06:26
It’s not just documentation, it is a real bug. If you run

http_proxy="" HTTP_PROXY=http://bad-proxy python . . .

the empty value of lowercase “http_proxy” should have priority over the other one. At the moment it depends on the order of os.environ.items(), which I understand is random.
msg263806 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-20 07:38
Hi Martin, hi Senthil,

thanks for the valuable comments. 

Will incorporate your suggestions later today.

Yes, Martin, it's a bug, and should be fixed for 2.7 and 3.5 as well, but I was unsure, if I get some feedback at all... Hence, this is a very nice experience for me. 

I'm out for jogging now,
Pete
msg263859 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-20 19:13
Hi Martin, hi Senthil,

please find a new patch attached, that incorporates your suggestions.

 * added a comment to get_proxies doc in urllib.rst
 * documented and fixed the mixed case scheme 
 * added a note to proxy_bypass_environment, that behaves slightly 
   different in this respect

Yes, mixed case situations are not handled in proxy_bypass_environment,
just lowercase and uppercase, while lowercase is preferred correctly.
I think, that the mixed case situation is pathologic enough and deserves to be ignored here. 

BTW, while looking at the code, I noticed, that most docstrings of the callers of proxy_bypass_environment are wrong: they say, that the proxies dict is returned, but they return the value of proxy_bypass_environment(), not get_proxies().

A follow up patch could do this in order to clean up this mess:
since there's always a call to get_proxies preceding the call to proxy_bypass_environment, we could add a second argument to the latter, passing in the proxy dict, that is thrown away at the moment. Since that carries a 'no' key already, if it exists, using it here would fix this ambiguity. While at it, fix up the affected docstrings.

What do you think about the attached patch and the last paragraph?
msg263868 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-21 02:02
The second patch looks reasonable (I left one minor grammar comment).

About getproxies_environment(), I meant that if you set no_proxy="", it may be ignored if NO_PROXY=. . . is also set. E.g. if you already have these set:

http_proxy=http://proxy
no_proxy=example.net
NO_PROXY=example.net

you should be able to override no_proxy="" to cancel bypassing for example.net.

I agree the proxy_bypass() docstring looks wrong. If you want to write a patch to fix that, or to make the code more efficient or easier to understand, that would be good.
msg263897 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-21 08:53
Here we go:

v3 fixes following issues:
 * prefer lowercase proxy environment settings, if multiple (disagreeing) settings are specified with differing case schemes (e.g. HTTP_PROXY vs. http_proxy)
 * an empty proxy variable value resets the related setting correctly 
 * apply this logic to no_proxy, too
 * document this behaviour 
 * fix misleading docstrings related to proxy_bypass
msg263912 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-21 11:28
Here's the finalized version of this patch, including unit tests.
msg263969 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-22 07:00
v5: don't require the proxies argument in proxy_bypass_environment()
msg264011 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-22 13:43
I found two bugs; see the comments.

In Python 2, it looks like the proxy_bypass_etc() functions are defined in urllib and imported into urllib2, so it makes sense to include the tests in test_urllib rather than test_urllib2.

Technically I think proxy_bypass_environment() is meant to be an internal function, but it is safer to keep the changes to in minimal for bug fixes. So I think the optional proxies parameter should be okay.
msg264114 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-24 16:40
* blatant error fixed
* one test case added
msg264115 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-24 16:43
> In Python 2, it looks like the proxy_bypass_etc() functions are defined
> in urllib and imported into urllib2, so it makes sense to include the
> tests in test_urllib rather than test_urllib2.

The tests are in test_urllib. test_urllib2 is testing proxy behaviour on a higher level, so I think, they're in the correct module, aren't they?
msg264140 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-25 00:57
Yes that was my rambling way of saying that I had checked to see if they were in the right place, and reporting that it was all okay :)

New patch seems okay to me.
msg264165 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-04-25 08:41
v7:
 - reorder test code in order to improve edibility
msg264175 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-25 13:56
V7 looks good to me
msg264180 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-25 15:35
New changeset 49b975122022 by Senthil Kumaran in branch '3.5':
Issue #26804: urllib.request will prefer lower_case proxy environment variables
https://hg.python.org/cpython/rev/49b975122022

New changeset 316593f5bf73 by Senthil Kumaran in branch 'default':
merge 3.5
https://hg.python.org/cpython/rev/316593f5bf73
msg264185 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-25 16:18
New changeset c502deb19cb0 by Senthil Kumaran in branch '2.7':
backport fix for Issue #26804.
https://hg.python.org/cpython/rev/c502deb19cb0
msg264186 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-04-25 16:18
This is fixed in all versions of Python.

Thank you for your contribution, Hans-Peter Jansen.
msg268566 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-06-14 16:22
In a couple of systems, I have to stick with 3.4. Is there a chance to have this patch in 3.4 as well, if a new release 3.4 is made?
msg268601 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-06-15 03:01
Unfortunately no. 3.4 is only in security fixes mode and this doesn't qualify as a security fix. Usually the bug fixes and feature additions are incentives for developers to upgrade their python codebase.
History
Date User Action Args
2016-06-15 03:01:25orsenthilsetmessages: + msg268601
2016-06-14 16:22:20frispetesetmessages: + msg268566
2016-04-25 16:18:48orsenthilsetstatus: open -> closed
resolution: fixed
messages: + msg264186

stage: commit review -> resolved
2016-04-25 16:18:03python-devsetmessages: + msg264185
2016-04-25 15:35:44python-devsetnosy: + python-dev
messages: + msg264180
2016-04-25 13:56:21martin.pantersetmessages: + msg264175
stage: patch review -> commit review
2016-04-25 08:41:21frispetesetfiles: + python-urllib-prefer-lowercase-proxies-v7.diff

messages: + msg264165
2016-04-25 00:57:52martin.pantersetmessages: + msg264140
2016-04-24 16:43:31frispetesetmessages: + msg264115
2016-04-24 16:40:16frispetesetfiles: + python-urllib-prefer-lowercase-proxies-v6.diff

messages: + msg264114
2016-04-22 13:43:52martin.pantersetmessages: + msg264011
2016-04-22 07:00:28frispetesetfiles: + python-urllib-prefer-lowercase-proxies-v5.diff

messages: + msg263969
2016-04-21 11:28:32frispetesetfiles: + python-urllib-prefer-lowercase-proxies-v4.diff

messages: + msg263912
2016-04-21 08:53:19frispetesetfiles: + python-urllib-prefer-lowercase-proxies-v3.diff

messages: + msg263897
2016-04-21 02:02:08martin.pantersetmessages: + msg263868
2016-04-20 19:13:46frispetesetfiles: + python-urllib-prefer-lowercase-proxies.diff

messages: + msg263859
2016-04-20 07:38:51frispetesetmessages: + msg263806
versions: + Python 2.7, Python 3.5
2016-04-20 06:26:16martin.pantersetmessages: + msg263801
2016-04-20 05:34:11orsenthilsetmessages: + msg263796
2016-04-20 02:46:10martin.pantersetnosy: + martin.panter
messages: + msg263791
2016-04-19 08:37:46SilentGhostsetnosy: + orsenthil

components: + Library (Lib), - Extension Modules
stage: patch review
2016-04-19 08:24:09frispetesetversions: + Python 3.6, - Python 3.5
2016-04-19 08:18:29frispetecreate