Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request no_proxy check differs from curl #71051

Closed
DanielMorrison mannequin opened this issue Apr 26, 2016 · 13 comments
Closed

urllib.request no_proxy check differs from curl #71051

DanielMorrison mannequin opened this issue Apr 26, 2016 · 13 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@DanielMorrison
Copy link
Mannequin

DanielMorrison mannequin commented Apr 26, 2016

BPO 26864
Nosy @vadmium, @zhangyangyu
Files
  • issue26864.patch
  • issue26864_v2.patch
  • issue26864_v3.patch
  • issue26864_v4.patch
  • issue26864_v5.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-04-30.12:47:56.872>
    created_at = <Date 2016-04-26.14:43:47.730>
    labels = ['type-bug', 'library']
    title = 'urllib.request no_proxy check differs from curl'
    updated_at = <Date 2016-04-30.12:47:56.871>
    user = 'https://bugs.python.org/DanielMorrison'

    bugs.python.org fields:

    activity = <Date 2016-04-30.12:47:56.871>
    actor = 'martin.panter'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2016-04-30.12:47:56.872>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2016-04-26.14:43:47.730>
    creator = 'Daniel Morrison'
    dependencies = []
    files = ['42621', '42622', '42624', '42625', '42627']
    hgrepos = []
    issue_num = 26864
    keywords = ['patch']
    message_count = 13.0
    messages = ['264297', '264311', '264313', '264318', '264339', '264340', '264345', '264348', '264354', '264356', '264364', '264372', '264540']
    nosy_count = 4.0
    nosy_names = ['python-dev', 'martin.panter', 'xiang.zhang', 'Daniel Morrison']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26864'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @DanielMorrison
    Copy link
    Mannequin Author

    DanielMorrison mannequin commented Apr 26, 2016

    The no_proxy environment variable works in python as a case
    sensitive suffix check.

    Curl handles this variable as a case insensitive hostname check.

    Case sensitivity appears to be in conflict with the DNS Case
    Insensitivity RFC (https://tools.ietf.org/html/rfc4343).

    While the suffix check is documented
    (https://docs.python.org/3/library/urllib.request.html), this
    seems to be problematic and inconsistent with other tools on the
    system.

    I believe the ideal solution would be to have proxy_bypass be a
    method of ProxyHandler so that it can be overridden without
    dependence on undocumented methods. This would also allow
    for the requested behavior to be added without breaking backwards
    compatibility.

    @DanielMorrison DanielMorrison mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 26, 2016
    @zhangyangyu
    Copy link
    Member

    No, urllib.request does not handle the no_proxy environment in case-sensitive way. It first checks if any environment_variable.lower() ends with "_proxy", and then checks if there is any environment_variable ends with "_proxy". So it works as a case insensitive check, but the lowercase one will override others, e.g, no_proxy overrides no_PROXY. But the document is misleading, it should be clarified.

    @zhangyangyu zhangyangyu added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Apr 26, 2016
    @DanielMorrison
    Copy link
    Mannequin Author

    DanielMorrison mannequin commented Apr 26, 2016

    I believe there was a misunderstanding.

    While the environment variable name is handled in a case
    insensitive way, the contents of the environment variable is not.

    Please see some examples below:

        >>> os.environ['no_proxy'] = 'example.com'
        >>> urllib.request.proxy_bypass('EXAMPLE.com')
        0
        >>> urllib.request.proxy_bypass('example.com')
        1

    Also to clarify the meaning of suffix check:

        >>> os.environ['no_proxy'] = 'example.com'
        >>> urllib.request.proxy_bypass('myexample.com')
        1

    My apologies for my lack of clarity.

    @zhangyangyu
    Copy link
    Member

    Ohh, it's my fault to misunderstand. I think your thinking is reasonable. Curl handles the suffix check insensitively.

    @zhangyangyu zhangyangyu added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error and removed docs Documentation in the Doc dir type-feature A feature request or enhancement labels Apr 26, 2016
    @zhangyangyu
    Copy link
    Member

    I write a patch to fix this.

    It seems on Windows and MacOS the behaviour is right. On MacOS it leaves the matching to fnmatch. On Windows it uses case insensitive regular matching.

    @zhangyangyu
    Copy link
    Member

    The code has changed recently. I update the patch to reveal the change.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 27, 2016

    I think this patch looks okay. It fixes the case insensitivity problem, but there is also the string suffix problem (see the myexample.com demo). I think it is reasonable for example.com to match my.example.com, but not myexample.com.

    @zhangyangyu
    Copy link
    Member

    Yes, you are right. I don't notice that. I will try to fix it.

    @zhangyangyu
    Copy link
    Member

    I update the patch to use regular expression to handle both case and suffix. This regular expression has the function of curl's check_noproxy. And I think a separate test case is better here.

    @zhangyangyu
    Copy link
    Member

    Update to improve the test case.

    @zhangyangyu
    Copy link
    Member

    Thanks for your comment martin. Using re.escape is really a good advise.

    @vadmium
    Copy link
    Member

    vadmium commented Apr 27, 2016

    This version looks okay to me, thanks.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 30, 2016

    New changeset ca882ee68d46 by Martin Panter in branch '3.5':
    Issue bpo-26864: Fix case insensitivity and suffix comparison with no_proxy
    https://hg.python.org/cpython/rev/ca882ee68d46

    New changeset 1ceb91974dc4 by Martin Panter in branch 'default':
    Issue bpo-26864: Merge no_proxy fixes from 3.5
    https://hg.python.org/cpython/rev/1ceb91974dc4

    New changeset a1aad42f1195 by Martin Panter in branch '2.7':
    Issue bpo-26864: Fix case insensitivity and suffix comparison with no_proxy
    https://hg.python.org/cpython/rev/a1aad42f1195

    @vadmium vadmium closed this as completed Apr 30, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants