Not sure if this is a documentation or behavior bug, but... the docs for urllib.request.Request.set_proxy (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.set_proxy) say
> Prepare the request by connecting to a proxy server. *The host and type will replace those of the instance*, and the instance’s selector will be the original URL given in the constructor.
(Emphasis mine.) In practice, behavior is more nuanced than that:
>>> from urllib.request import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.host, req.type # So far, so good...
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.host, req.type # Type doesn't change!
('proxy:other-port', 'https')
Looking at the source (https://github.com/python/cpython/blob/v3.7.0/Lib/urllib/request.py#L397) it's obvious why https is treated specially.
The behavior is consistent with how things worked on py2...
>>> from urllib2 import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')
... but only if you're actually inspecting host/type along the way!
>>> from urllib2 import Request
>>> req = Request('https://hostame:port/some/path')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'http')
(FWIW, this came up while porting an application from py2 to py3; there was a unit test expecting that last behavior of proxying a https connection through a http proxy.)
|