Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request.getproxies() misparses Windows registry proxy settings #86793

Closed
benrg mannequin opened this issue Dec 12, 2020 · 14 comments
Closed

urllib.request.getproxies() misparses Windows registry proxy settings #86793

benrg mannequin opened this issue Dec 12, 2020 · 14 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@benrg
Copy link
Mannequin

benrg mannequin commented Dec 12, 2020

BPO 42627
Nosy @pfmoore, @tjguk, @zware, @zooba, @corona10, @chrisxiao, @NateScarlet, @CrazyBoyFeng, @keuin
PRs
  • bpo-42627: Fix wrong parsing of Windows registry proxy settings #26307
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-12-12.20:27:52.875>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'library', 'OS-windows']
    title = 'urllib.request.getproxies() misparses Windows registry proxy settings'
    updated_at = <Date 2021-11-20.14:38:47.145>
    user = 'https://bugs.python.org/benrg'

    bugs.python.org fields:

    activity = <Date 2021-11-20.14:38:47.145>
    actor = 'CrazyBoyFeng'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2020-12-12.20:27:52.875>
    creator = 'benrg'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42627
    keywords = ['patch']
    message_count = 10.0
    messages = ['382921', '387468', '388149', '388152', '393927', '394668', '395276', '401215', '401995', '406658']
    nosy_count = 11.0
    nosy_names = ['paul.moore', 'tim.golden', 'benrg', 'zach.ware', 'steve.dower', 'corona10', 'chrisxiao', 'kotori', 'NateScarlet', 'CrazyBoyFeng', 'keuin']
    pr_nums = ['26307']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42627'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @benrg
    Copy link
    Mannequin Author

    benrg mannequin commented Dec 12, 2020

    If HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer contains the string http=host:123;https=host:456;ftp=host:789, then getproxies_registry() should return

    {'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 'http://host:789'}
    

    for consistency with WinInet and Chromium, but it actually returns

    {'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 'ftp://host:789'}
    

    This bug has existed for a very long time (since Python 2.0.1 if not earlier), but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS proxies in version 1.26. Before that, an https prefix on the HTTPS proxy url was silently treated as http, accidentally resulting in the correct behavior.

    There are additional bugs in the treatment of single-proxy strings (the case when the string contains no = character).

    The Chromium code for parsing the ProxyServer string can be found here: https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86

    Below is my attempt at modifying the code from getproxies_registry to approximately match Chromium's behavior. I could turn this into a patch, but I'd like feedback on the corner cases first.

        if '=' not in proxyServer and ';' not in proxyServer:
            # Use one setting for all protocols.
            # Chromium treats this as a separate category, and some software
            # uses the ALL_PROXY environment variable for a similar purpose,
            # so arguably this should be 'all={}'.format(proxyServer),
            # but this is more backward compatible.
            proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer)
    
        for p in proxyServer.split(';'):
            # Chromium and WinInet are inconsistent in their treatment of
            # invalid strings with the wrong number of = characters. It
            # probably doesn't matter.
            protocol, addresses = p.split('=', 1)
            protocol = protocol.strip()
    
            # Chromium supports more than one proxy per protocol. I don't
            # know how many clients support the same, but handling it is at
            # least no worse than leaving the commas uninterpreted.
            for address in addresses.split(','):
                if protocol in {'http', 'https', 'ftp', 'socks'}:
                    # See if address has a type:// prefix
                    if not re.match('(?:[^/:]+)://', address):
                        if protocol == 'socks':
                            # Chromium notes that the correct protocol here
                            # is SOCKS4, but "socks://" is interpreted
                            # as SOCKS5 elsewhere. I don't know whether
                            # prepending socks4:// here would break code.
                            address = 'socks://' + address
                        else:
                            address = 'http://' + address
    
                # A string like 'http=foo;http=bar' will produce a
                # comma-separated list, while previously 'bar' would
                # override 'foo'. That could potentially break something.
                if protocol not in proxies:
                    proxies[protocol] = address
                else:
                    proxies[protocol] += ',' + address

    @benrg benrg mannequin added OS-windows stdlib Python modules in the Lib dir 3.7 (EOL) end of life 3.8 only security fixes 3.10 only security fixes 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels Dec 12, 2020
    @kotori
    Copy link
    Mannequin

    kotori mannequin commented Feb 21, 2021

    I came across this issue as well.
    I checked Microsoft documentations and it seems InternetGetProxyInfo in WinInet is deprecated, while WinHttpGetIEProxyConfigForCurrentUser in WinHTTP will return the exact same string as what it stored registery.

    https://docs.microsoft.com/en-US/troubleshoot/windows-client/networking/configure-client-proxy-server-settings-by-registry-file
    Also from this documentation, a proxy server could have "http://" prefix, so I guess it could also support "https://" prefix if a user set a https proxy.

    @kotori kotori mannequin added OS-windows stdlib Python modules in the Lib dir and removed OS-windows stdlib Python modules in the Lib dir labels Feb 21, 2021
    @CrazyBoyFeng
    Copy link
    Mannequin

    CrazyBoyFeng mannequin commented Mar 5, 2021

    We know Windows reslove:
    http=host:port;https=host:port;ftp=host:port;socks=host:port
    as:
    http=http://host:port;https=http://host:port;ftp=http://host:port;socks=socks://host:port
    means:
    Using HTTP type proxy for HTTP, HTTPS and FTP requests, but Socks4/4a type proxy for the other TCP requests.
    We notice that socks are different from the others.

    Now I want to know what if Windows slove this:
    http=socks://host:port;ftp=https://host:port;socks=https://host:port
    Does it mean using Socks4/4a type proxy for HTTP requests, HTTPS type proxy for FTP requests, and HTTP type proxy for the other TCP requests?
    Or just invalid settings?

    @CrazyBoyFeng
    Copy link
    Mannequin

    CrazyBoyFeng mannequin commented Mar 5, 2021

    I make some black box tests with the HTTPS proxy.
    Set system proxy http=https://host:port and start a HTTPS proxy, then IE, Edge (Chromium) and benrg's code (using urllib3) work fine while fetching http://website. Then I shutdown the HTTPS proxy and start a HTTP proxy on the same port, I get SSLError. That's it.

    benrg should add your patch as soon as possible.
    Too few people actually use HTTPS proxy. Even if there is a bug with HTTPS proxy, it will not be too late to fix it when it is discovered.

    @chrisxiao
    Copy link
    Mannequin

    chrisxiao mannequin commented May 19, 2021

    I found this problem too.
    expecting for fixing

    @CrazyBoyFeng
    Copy link
    Mannequin

    CrazyBoyFeng mannequin commented May 28, 2021

    I removed the multi-proxies-per-protocol support from the PR I submitted. The reason is that:

    1. This code reads the proxy settings from the Windows registry. Multi-proxies-per-protocol cannot be resolved by Windows system.
    2. Using a comma-separated string for the multi-proxies-per-protocol support, is not backward compatible. Not only Windows, but Linux is also one proxy per protocol.

    If you want to implement the multi-proxies-per-protocol support, then I think you should open another issue and make changes to all OS platforms.

    @CrazyBoyFeng
    Copy link
    Mannequin

    CrazyBoyFeng mannequin commented Jun 7, 2021

    We should have no problem with how to parse HTTP proxy and HTTPS proxy. But I recently discovered an additional problem when using requests, that is, how to parse the SOCKS proxy correctly.

    I investigated the parsing of some commonly used software:
    curl will try to read the all_proxy variable first, if not, then read the two system environment variables http_proxy and https_proxy.
    Only these two environment variables related to proxy exist in most Linux distributions.
    The following formats are valid for curl:

    export http_proxy=socks4h://127.0.0.1:5050
    export https_proxy=socks5://1.1.1.1:4040
    

    The h in socks4h means doing DNS query on the proxy server.

    requests also follows this format.
    If you want to use the socks proxy in requests[socks], you need to specify the proxies parameter:

    proxies['http']='socks4://127.0.0.1:6060'
    proxies['https']='socks5h://127.0.0.1:7070'
    

    Since it is customary to resolve SOCKS proxies in this way, I think CPython can consider doing the same with Windows registry.

    In previous versions, CPython parsed the SOCKS proxy in the Windows registry socks=localhost:8080 as:

    proxies['socks']='socks://localhost:8080'
    

    I think it can be changed to

    proxies['http']=proxies['http'] or'socks4://localhost:8080'
    proxies['https']=proxies['http'] or'socks4://localhost:8080'
    proxies['socks']='socks://localhost:8080'
    

    The use of or is to prevent the existing HTTP and HTTPS proxy settings from being overwritten.
    I tried it on my Win10 PC. When I set up HTTP and SOCKS proxies at the same time, IE and Edge will use HTTP proxy.
    The reason using socks4 instead of socks4h or socks5 is that, when I change system proxy setting to a remote SOCKS proxy, I found that IE and Edge do not query DNS from the proxy but from local DNS polluted by the ISP. And if the running socks proxy version is SOCKS5, it will output errors of unsupported socks version 0x4. So Windows only supports the SOCKS proxy socks4://.
    It's a pity that we don't know the code of Windows, and I can't prove it with the codes.
    proxies['socks'] is reserved for backward compatibility.

    If you have any comments or suggestions, please let me know.

    @keuin
    Copy link
    Mannequin

    keuin mannequin commented Sep 7, 2021

    The fix is available as a pull request on GitHub for months (#26307). However, it seems that this pull request needs an approval from one maintainer before running any test. Could anyone help this out?

    @zooba
    Copy link
    Member

    zooba commented Sep 16, 2021

    I think the PR is basically ready, unfortunately it's stuck behind a CI issue we only just fixed, and needs to merge from main before it'll clear.

    Posting here once CI is green will get attention faster than on GitHub.

    @zooba zooba added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Sep 16, 2021
    @CrazyBoyFeng
    Copy link
    Mannequin

    CrazyBoyFeng mannequin commented Nov 20, 2021

    Sorry I didn't see this comment before.

    Can it be merged now?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @SekiBetu
    Copy link

    I think the PR is basically ready, unfortunately it's stuck behind a CI issue we only just fixed, and needs to merge from main before it'll clear.

    Posting here once CI is green will get attention faster than on GitHub.

    any updates?

    @zooba
    Copy link
    Member

    zooba commented Apr 28, 2022

    I retriggered the CI to see if it passes now.

    @TTimo
    Copy link

    TTimo commented May 10, 2022

    Hello,

    I am curious if this will also address https://stackoverflow.com/questions/65931275/python-requests-module-does-it-use-system-level-on-windows-proxy-settings or if it's yet another problem?

    @zooba zooba closed this as completed May 12, 2022
    @zooba
    Copy link
    Member

    zooba commented May 12, 2022

    @TTimo From a quick glance, I think it'll deal with part of the second half of that question. It should be able to pick up manually customised proxies, but I don't think it'll get automatically configured ones. And anything that's bypassing urllib here won't get any benefit, naturally.

    coletdjnz added a commit to coletdjnz/yt-dlp-dev that referenced this issue Jul 1, 2022
    coletdjnz added a commit to yt-dlp/yt-dlp that referenced this issue May 27, 2023
    Convert proxies extracted from windows registry to http for older Python versions.
    See: python/cpython#86793
    
    Authored by: coletdjnz
    aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024
    Convert proxies extracted from windows registry to http for older Python versions.
    See: python/cpython#86793
    
    Authored by: coletdjnz
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants