classification
Title: urllib.request.getproxies() misparses Windows registry proxy settings
Type: behavior Stage: patch review
Components: Library (Lib), Windows Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: CrazyBoyFeng, NateScarlet, benrg, chrisxiao, corona10, keuin, kotori, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2020-12-12 20:27 by benrg, last changed 2021-09-16 22:43 by steve.dower.

Pull Requests
URL Status Linked Edit
PR 26307 open CrazyBoyFeng, 2021-05-22 07:55
Messages (9)
msg382921 - (view) Author: (benrg) Date: 2020-12-12 20:27
If `HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer` contains the string `http=host:123;https=host:456;ftp=host:789`, then getproxies_registry() should return

    {'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 'http://host:789'}

for consistency with WinInet and Chromium, but it actually returns

    {'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 'ftp://host:789'}

This bug has existed for a very long time (since Python 2.0.1 if not earlier), but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS proxies in version 1.26. Before that, an `https` prefix on the HTTPS proxy url was silently treated as `http`, accidentally resulting in the correct behavior.

There are additional bugs in the treatment of single-proxy strings (the case when the string contains no `=` character).

The Chromium code for parsing the ProxyServer string can be found here: https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86

Below is my attempt at modifying the code from `getproxies_registry` to approximately match Chromium's behavior. I could turn this into a patch, but I'd like feedback on the corner cases first.

    if '=' not in proxyServer and ';' not in proxyServer:
        # Use one setting for all protocols.
        # Chromium treats this as a separate category, and some software
        # uses the ALL_PROXY environment variable for a similar purpose,
        # so arguably this should be 'all={}'.format(proxyServer),
        # but this is more backward compatible.
        proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer)

    for p in proxyServer.split(';'):
        # Chromium and WinInet are inconsistent in their treatment of
        # invalid strings with the wrong number of = characters. It
        # probably doesn't matter.
        protocol, addresses = p.split('=', 1)
        protocol = protocol.strip()

        # Chromium supports more than one proxy per protocol. I don't
        # know how many clients support the same, but handling it is at
        # least no worse than leaving the commas uninterpreted.
        for address in addresses.split(','):
            if protocol in {'http', 'https', 'ftp', 'socks'}:
                # See if address has a type:// prefix
                if not re.match('(?:[^/:]+)://', address):
                    if protocol == 'socks':
                        # Chromium notes that the correct protocol here
                        # is SOCKS4, but "socks://" is interpreted
                        # as SOCKS5 elsewhere. I don't know whether
                        # prepending socks4:// here would break code.
                        address = 'socks://' + address
                    else:
                        address = 'http://' + address

            # A string like 'http=foo;http=bar' will produce a
            # comma-separated list, while previously 'bar' would
            # override 'foo'. That could potentially break something.
            if protocol not in proxies:
                proxies[protocol] = address
            else:
                proxies[protocol] += ',' + address
msg387468 - (view) Author: 双草酸酯 (kotori) Date: 2021-02-21 17:49
I came across this issue as well.
I checked Microsoft documentations and it seems `InternetGetProxyInfo` in WinInet is deprecated, while `WinHttpGetIEProxyConfigForCurrentUser` in WinHTTP will return the exact same string as what it stored registery. 

https://docs.microsoft.com/en-US/troubleshoot/windows-client/networking/configure-client-proxy-server-settings-by-registry-file
Also from this documentation, a proxy server could have "http://" prefix, so I guess it could also support "https://" prefix if a user set a https proxy.
msg388149 - (view) Author: 狂男风 (CrazyBoyFeng) * Date: 2021-03-05 07:49
We know Windows reslove:
`http=host:port;https=host:port;ftp=host:port;socks=host:port`
as:
`http=http://host:port;https=http://host:port;ftp=http://host:port;socks=socks://host:port`
means:  
Using HTTP type proxy for HTTP, HTTPS and FTP requests, but Socks4/4a type proxy for the other TCP requests.
We notice that socks are different from the others.

Now I want to know what if Windows slove this:
`http=socks://host:port;ftp=https://host:port;socks=https://host:port`
Does it mean using Socks4/4a type proxy for HTTP requests, HTTPS type proxy for FTP requests, and HTTP type proxy for the other TCP requests? 
Or just invalid settings?
msg388152 - (view) Author: 狂男风 (CrazyBoyFeng) * Date: 2021-03-05 12:15
I make some black box tests with the HTTPS proxy. 
Set system proxy `http=https://host:port` and start a HTTPS proxy, then IE, Edge (Chromium) and benrg's code (using `urllib3`) work fine while fetching `http://website`. Then I shutdown the HTTPS proxy and start a HTTP proxy on the same port, I get SSLError. That's it.

benrg should add your patch as soon as possible.
Too few people actually use HTTPS proxy. Even if there is a bug with HTTPS proxy, it will not be too late to fix it when it is discovered.
msg393927 - (view) Author: Chris Xiao (chrisxiao) * Date: 2021-05-19 07:20
I found this problem too.
expecting for fixing
msg394668 - (view) Author: 狂男风 (CrazyBoyFeng) * Date: 2021-05-28 16:26
I removed the multi-proxies-per-protocol support from the PR I submitted. The reason is that:

1. This code reads the proxy settings from the Windows registry. Multi-proxies-per-protocol cannot be resolved by Windows system.
2. Using a comma-separated string for the multi-proxies-per-protocol support, is not backward compatible. Not only Windows, but Linux is also one proxy per protocol.

If you want to implement the multi-proxies-per-protocol support, then I think you should open another issue and make changes to all OS platforms.
msg395276 - (view) Author: 狂男风 (CrazyBoyFeng) * Date: 2021-06-07 19:10
We should have no problem with how to parse HTTP proxy and HTTPS proxy. But I recently discovered an additional problem when using `requests`, that is, how to parse the SOCKS proxy correctly.

I investigated the parsing of some commonly used software:
`curl` will try to read the `all_proxy` variable first, if not, then read the two system environment variables `http_proxy` and `https_proxy`.
Only these two environment variables related to proxy exist in most Linux distributions.
The following formats are valid for `curl`:
```
export http_proxy=socks4h://127.0.0.1:5050
export https_proxy=socks5://1.1.1.1:4040
```
The `h` in `socks4h` means doing DNS query on the proxy server.

`requests` also follows this format.
If you want to use the socks proxy in `requests[socks]`, you need to specify the proxies parameter:
```
proxies['http']='socks4://127.0.0.1:6060'
proxies['https']='socks5h://127.0.0.1:7070'
```

Since it is customary to resolve SOCKS proxies in this way, I think CPython can consider doing the same with Windows registry.

In previous versions, CPython parsed the SOCKS proxy in the Windows registry `socks=localhost:8080` as:
```
proxies['socks']='socks://localhost:8080'
```
I think it can be changed to
```
proxies['http']=proxies['http'] or'socks4://localhost:8080'
proxies['https']=proxies['http'] or'socks4://localhost:8080'
proxies['socks']='socks://localhost:8080'
```
The use of `or` is to prevent the existing HTTP and HTTPS proxy settings from being overwritten.  
I tried it on my Win10 PC. When I set up HTTP and SOCKS proxies at the same time, IE and Edge will use HTTP proxy.  
The reason using `socks4` instead of `socks4h` or `socks5` is that, when I change system proxy setting to a remote SOCKS proxy, I found that IE and Edge do not query DNS from the proxy but from local DNS polluted by the ISP. And if the running socks proxy version is SOCKS5, it will output errors of `unsupported socks version 0x4`. So Windows only supports the SOCKS proxy `socks4://`.  
It's a pity that we don't know the code of Windows, and I can't prove it with the codes.  
`proxies['socks']` is reserved for backward compatibility.

If you have any comments or suggestions, please let me know.
msg401215 - (view) Author: Keuin (keuin) Date: 2021-09-07 08:40
The fix is available as a pull request on GitHub for months (https://github.com/python/cpython/pull/26307). However, it seems that this pull request needs an approval from one maintainer before running any test. Could anyone help this out?
msg401995 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-09-16 22:43
I think the PR is basically ready, unfortunately it's stuck behind a CI issue we only just fixed, and needs to merge from main before it'll clear.

Posting here once CI is green will get attention faster than on GitHub.
History
Date User Action Args
2021-09-16 22:43:23steve.dowersetmessages: + msg401995
versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8
2021-09-07 08:40:33keuinsetnosy: + keuin
messages: + msg401215
2021-06-07 19:10:54CrazyBoyFengsetmessages: + msg395276
2021-05-28 16:26:40CrazyBoyFengsetmessages: + msg394668
2021-05-22 07:55:02CrazyBoyFengsetkeywords: + patch
stage: patch review
pull_requests: + pull_request24906
2021-05-19 07:20:12chrisxiaosetnosy: + chrisxiao
messages: + msg393927
2021-03-05 12:15:56CrazyBoyFengsetmessages: + msg388152
2021-03-05 07:49:58CrazyBoyFengsetmessages: + msg388149
2021-03-05 05:52:56CrazyBoyFengsetnosy: + CrazyBoyFeng
2021-03-01 07:09:51NateScarletsetnosy: + NateScarlet
2021-02-23 12:07:33corona10setnosy: + corona10
2021-02-22 19:50:17kotorisetcomponents: + Library (Lib)
2021-02-22 19:49:53kotorisetcomponents: + Windows, - Library (Lib)
2021-02-21 17:49:09kotorisetnosy: + kotori
messages: + msg387468
components: - Windows
2020-12-12 20:27:52benrgcreate