Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize lowercase proxy variables in urllib.request #70991

Closed
frispete mannequin opened this issue Apr 19, 2016 · 21 comments
Closed

Prioritize lowercase proxy variables in urllib.request #70991

frispete mannequin opened this issue Apr 19, 2016 · 21 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@frispete
Copy link
Mannequin

frispete mannequin commented Apr 19, 2016

BPO 26804
Nosy @orsenthil, @vadmium
Files
  • prioritize_lowercase_proxy_vars_in_urllib_request.diff: Prioritize lowercase proxy variables in urllib.request
  • python-urllib-prefer-lowercase-proxies.diff
  • python-urllib-prefer-lowercase-proxies-v3.diff
  • python-urllib-prefer-lowercase-proxies-v4.diff
  • python-urllib-prefer-lowercase-proxies-v5.diff
  • python-urllib-prefer-lowercase-proxies-v6.diff
  • python-urllib-prefer-lowercase-proxies-v7.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-04-25.16:18:48.839>
    created_at = <Date 2016-04-19.08:18:29.814>
    labels = ['type-bug', 'library']
    title = 'Prioritize lowercase proxy variables in urllib.request'
    updated_at = <Date 2016-06-15.03:01:25.727>
    user = 'https://bugs.python.org/frispete'

    bugs.python.org fields:

    activity = <Date 2016-06-15.03:01:25.727>
    actor = 'orsenthil'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-04-25.16:18:48.839>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2016-04-19.08:18:29.814>
    creator = 'frispete'
    dependencies = []
    files = ['42516', '42544', '42548', '42552', '42565', '42582', '42586']
    hgrepos = []
    issue_num = 26804
    keywords = ['patch']
    message_count = 21.0
    messages = ['263720', '263791', '263796', '263801', '263806', '263859', '263868', '263897', '263912', '263969', '264011', '264114', '264115', '264140', '264165', '264175', '264180', '264185', '264186', '268566', '268601']
    nosy_count = 4.0
    nosy_names = ['orsenthil', 'frispete', 'python-dev', 'martin.panter']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26804'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 19, 2016

    During programming a function, that replaces a wget call, I noticed, that something is wrong with urllibs proxy handling.

    I usually use the scheme "http_proxy= wget -N -nd URL" when I need to bypass the proxy. Hence I was pretty confused, that this doesn't work with python(3). Creating an empty ProxyHandler isn't the real Mc Coy either. Diving into the issue, I found getproxies_environment, but couldn't make much sense out of its behavior, up until I noticed, that
    my OS (openSUSE ) creates both variants of environment variables behind the scenes: uppercase and lowercase.

    Consequence: python3 needs the scheme "http_proxy= HTTP_PROXY= python3 ..."

    Since I, like everyone else, prefer gentle tones over the loud, and want to spare this surprise for others in the future, I propose the attached patch.

    Process environment variables in two passes, first uppercase, then lowercase, allowing an empty lowercase value to overrule any uppercase value.

    Please consider applying this.

    @frispete frispete mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Apr 19, 2016
    @SilentGhost SilentGhost mannequin added stdlib Python modules in the Lib dir and removed extension-modules C modules in the Modules dir labels Apr 19, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Apr 20, 2016

    I think this should be applied as a bug fix to 2.7 and 3.5 as well. What do you think? Lowercase is the normal way to use these variables.

    I left some comments on the code review.

    A similar bug seems to exist for the “no_proxy” variable. Also, it would be nice to add a test case for this.

    @orsenthil
    Copy link
    Member

    Hi Hans-Peter,

    I agree with Martin's comments and suggestion.

    If I understand the suggestion correctly, the only change will be a documentation change. Isn't it?

    Because getproxies_environment() in it's current form already fetches the lower_case proxy which you want to highlight that is a winner in case there are two environment variables with different cases.

    Thanks!

    @vadmium
    Copy link
    Member

    vadmium commented Apr 20, 2016

    It’s not just documentation, it is a real bug. If you run

    http_proxy="" HTTP_PROXY=http://bad-proxy python . . .

    the empty value of lowercase “http_proxy” should have priority over the other one. At the moment it depends on the order of os.environ.items(), which I understand is random.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 20, 2016

    Hi Martin, hi Senthil,

    thanks for the valuable comments.

    Will incorporate your suggestions later today.

    Yes, Martin, it's a bug, and should be fixed for 2.7 and 3.5 as well, but I was unsure, if I get some feedback at all... Hence, this is a very nice experience for me.

    I'm out for jogging now,
    Pete

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 20, 2016

    Hi Martin, hi Senthil,

    please find a new patch attached, that incorporates your suggestions.

    • added a comment to get_proxies doc in urllib.rst
    • documented and fixed the mixed case scheme
    • added a note to proxy_bypass_environment, that behaves slightly
      different in this respect

    Yes, mixed case situations are not handled in proxy_bypass_environment,
    just lowercase and uppercase, while lowercase is preferred correctly.
    I think, that the mixed case situation is pathologic enough and deserves to be ignored here.

    BTW, while looking at the code, I noticed, that most docstrings of the callers of proxy_bypass_environment are wrong: they say, that the proxies dict is returned, but they return the value of proxy_bypass_environment(), not get_proxies().

    A follow up patch could do this in order to clean up this mess:
    since there's always a call to get_proxies preceding the call to proxy_bypass_environment, we could add a second argument to the latter, passing in the proxy dict, that is thrown away at the moment. Since that carries a 'no' key already, if it exists, using it here would fix this ambiguity. While at it, fix up the affected docstrings.

    What do you think about the attached patch and the last paragraph?

    @vadmium
    Copy link
    Member

    vadmium commented Apr 21, 2016

    The second patch looks reasonable (I left one minor grammar comment).

    About getproxies_environment(), I meant that if you set no_proxy="", it may be ignored if NO_PROXY=. . . is also set. E.g. if you already have these set:

    http_proxy=http://proxy
    no_proxy=example.net
    NO_PROXY=example.net

    you should be able to override no_proxy="" to cancel bypassing for example.net.

    I agree the proxy_bypass() docstring looks wrong. If you want to write a patch to fix that, or to make the code more efficient or easier to understand, that would be good.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 21, 2016

    Here we go:

    v3 fixes following issues:

    • prefer lowercase proxy environment settings, if multiple (disagreeing) settings are specified with differing case schemes (e.g. HTTP_PROXY vs. http_proxy)
    • an empty proxy variable value resets the related setting correctly
    • apply this logic to no_proxy, too
    • document this behaviour
    • fix misleading docstrings related to proxy_bypass

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 21, 2016

    Here's the finalized version of this patch, including unit tests.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 22, 2016

    v5: don't require the proxies argument in proxy_bypass_environment()

    @vadmium
    Copy link
    Member

    vadmium commented Apr 22, 2016

    I found two bugs; see the comments.

    In Python 2, it looks like the proxy_bypass_etc() functions are defined in urllib and imported into urllib2, so it makes sense to include the tests in test_urllib rather than test_urllib2.

    Technically I think proxy_bypass_environment() is meant to be an internal function, but it is safer to keep the changes to in minimal for bug fixes. So I think the optional proxies parameter should be okay.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 24, 2016

    • blatant error fixed
    • one test case added

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 24, 2016

    In Python 2, it looks like the proxy_bypass_etc() functions are defined
    in urllib and imported into urllib2, so it makes sense to include the
    tests in test_urllib rather than test_urllib2.

    The tests are in test_urllib. test_urllib2 is testing proxy behaviour on a higher level, so I think, they're in the correct module, aren't they?

    @vadmium
    Copy link
    Member

    vadmium commented Apr 25, 2016

    Yes that was my rambling way of saying that I had checked to see if they were in the right place, and reporting that it was all okay :)

    New patch seems okay to me.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Apr 25, 2016

    v7:

    • reorder test code in order to improve edibility

    @vadmium
    Copy link
    Member

    vadmium commented Apr 25, 2016

    V7 looks good to me

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2016

    New changeset 49b975122022 by Senthil Kumaran in branch '3.5':
    Issue bpo-26804: urllib.request will prefer lower_case proxy environment variables
    https://hg.python.org/cpython/rev/49b975122022

    New changeset 316593f5bf73 by Senthil Kumaran in branch 'default':
    merge 3.5
    https://hg.python.org/cpython/rev/316593f5bf73

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2016

    New changeset c502deb19cb0 by Senthil Kumaran in branch '2.7':
    backport fix for Issue bpo-26804.
    https://hg.python.org/cpython/rev/c502deb19cb0

    @orsenthil
    Copy link
    Member

    This is fixed in all versions of Python.

    Thank you for your contribution, Hans-Peter Jansen.

    @frispete
    Copy link
    Mannequin Author

    frispete mannequin commented Jun 14, 2016

    In a couple of systems, I have to stick with 3.4. Is there a chance to have this patch in 3.4 as well, if a new release 3.4 is made?

    @orsenthil
    Copy link
    Member

    Unfortunately no. 3.4 is only in security fixes mode and this doesn't qualify as a security fix. Usually the bug fixes and feature additions are incentives for developers to upgrade their python codebase.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants