classification
Title: urllib.request.getproxies() misparses Windows registry proxy settings
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: CrazyBoyFeng, NateScarlet, benrg, corona10, kotori, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-12-12 20:27 by benrg, last changed 2021-03-05 12:15 by CrazyBoyFeng.

Messages (4)
msg382921 - (view) Author: (benrg) Date: 2020-12-12 20:27
If `HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer` contains the string `http=host:123;https=host:456;ftp=host:789`, then getproxies_registry() should return

    {'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 'http://host:789'}

for consistency with WinInet and Chromium, but it actually returns

    {'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 'ftp://host:789'}

This bug has existed for a very long time (since Python 2.0.1 if not earlier), but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS proxies in version 1.26. Before that, an `https` prefix on the HTTPS proxy url was silently treated as `http`, accidentally resulting in the correct behavior.

There are additional bugs in the treatment of single-proxy strings (the case when the string contains no `=` character).

The Chromium code for parsing the ProxyServer string can be found here: https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86

Below is my attempt at modifying the code from `getproxies_registry` to approximately match Chromium's behavior. I could turn this into a patch, but I'd like feedback on the corner cases first.

    if '=' not in proxyServer and ';' not in proxyServer:
        # Use one setting for all protocols.
        # Chromium treats this as a separate category, and some software
        # uses the ALL_PROXY environment variable for a similar purpose,
        # so arguably this should be 'all={}'.format(proxyServer),
        # but this is more backward compatible.
        proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer)

    for p in proxyServer.split(';'):
        # Chromium and WinInet are inconsistent in their treatment of
        # invalid strings with the wrong number of = characters. It
        # probably doesn't matter.
        protocol, addresses = p.split('=', 1)
        protocol = protocol.strip()

        # Chromium supports more than one proxy per protocol. I don't
        # know how many clients support the same, but handling it is at
        # least no worse than leaving the commas uninterpreted.
        for address in addresses.split(','):
            if protocol in {'http', 'https', 'ftp', 'socks'}:
                # See if address has a type:// prefix
                if not re.match('(?:[^/:]+)://', address):
                    if protocol == 'socks':
                        # Chromium notes that the correct protocol here
                        # is SOCKS4, but "socks://" is interpreted
                        # as SOCKS5 elsewhere. I don't know whether
                        # prepending socks4:// here would break code.
                        address = 'socks://' + address
                    else:
                        address = 'http://' + address

            # A string like 'http=foo;http=bar' will produce a
            # comma-separated list, while previously 'bar' would
            # override 'foo'. That could potentially break something.
            if protocol not in proxies:
                proxies[protocol] = address
            else:
                proxies[protocol] += ',' + address
msg387468 - (view) Author: 双草酸酯 (kotori) Date: 2021-02-21 17:49
I came across this issue as well.
I checked Microsoft documentations and it seems `InternetGetProxyInfo` in WinInet is deprecated, while `WinHttpGetIEProxyConfigForCurrentUser` in WinHTTP will return the exact same string as what it stored registery. 

https://docs.microsoft.com/en-US/troubleshoot/windows-client/networking/configure-client-proxy-server-settings-by-registry-file
Also from this documentation, a proxy server could have "http://" prefix, so I guess it could also support "https://" prefix if a user set a https proxy.
msg388149 - (view) Author: 狂男风 (CrazyBoyFeng) Date: 2021-03-05 07:49
We know Windows reslove:
`http=host:port;https=host:port;ftp=host:port;socks=host:port`
as:
`http=http://host:port;https=http://host:port;ftp=http://host:port;socks=socks://host:port`
means:  
Using HTTP type proxy for HTTP, HTTPS and FTP requests, but Socks4/4a type proxy for the other TCP requests.
We notice that socks are different from the others.

Now I want to know what if Windows slove this:
`http=socks://host:port;ftp=https://host:port;socks=https://host:port`
Does it mean using Socks4/4a type proxy for HTTP requests, HTTPS type proxy for FTP requests, and HTTP type proxy for the other TCP requests? 
Or just invalid settings?
msg388152 - (view) Author: 狂男风 (CrazyBoyFeng) Date: 2021-03-05 12:15
I make some black box tests with the HTTPS proxy. 
Set system proxy `http=https://host:port` and start a HTTPS proxy, then IE, Edge (Chromium) and benrg's code (using `urllib3`) work fine while fetching `http://website`. Then I shutdown the HTTPS proxy and start a HTTP proxy on the same port, I get SSLError. That's it.

benrg should add your patch as soon as possible.
Too few people actually use HTTPS proxy. Even if there is a bug with HTTPS proxy, it will not be too late to fix it when it is discovered.
History
Date User Action Args
2021-03-05 12:15:56CrazyBoyFengsetmessages: + msg388152
2021-03-05 07:49:58CrazyBoyFengsetmessages: + msg388149
2021-03-05 05:52:56CrazyBoyFengsetnosy: + CrazyBoyFeng
2021-03-01 07:09:51NateScarletsetnosy: + NateScarlet
2021-02-23 12:07:33corona10setnosy: + corona10
2021-02-22 19:50:17kotorisetcomponents: + Library (Lib)
2021-02-22 19:49:53kotorisetcomponents: + Windows, - Library (Lib)
2021-02-21 17:49:09kotorisetnosy: + kotori
messages: + msg387468
components: - Windows
2020-12-12 20:27:52benrgcreate