This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tburke
Recipients tburke
Date 2018-09-15.17:17:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537031851.88.0.956365154283.issue34698@psf.upfronthosting.co.za>
In-reply-to
Content
Not sure if this is a documentation or behavior bug, but... the docs for urllib.request.Request.set_proxy (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.set_proxy) say

> Prepare the request by connecting to a proxy server. *The host and type will replace those of the instance*, and the instance’s selector will be the original URL given in the constructor.

(Emphasis mine.) In practice, behavior is more nuanced than that:

>>> from urllib.request import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.host, req.type # So far, so good...
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.host, req.type # Type doesn't change!
('proxy:other-port', 'https')

Looking at the source (https://github.com/python/cpython/blob/v3.7.0/Lib/urllib/request.py#L397) it's obvious why https is treated specially.

The behavior is consistent with how things worked on py2...

>>> from urllib2 import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')


... but only if you're actually inspecting host/type along the way!

>>> from urllib2 import Request
>>> req = Request('https://hostame:port/some/path')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'http')

(FWIW, this came up while porting an application from py2 to py3; there was a unit test expecting that last behavior of proxying a https connection through a http proxy.)
History
Date User Action Args
2018-09-15 17:17:31tburkesetrecipients: + tburke
2018-09-15 17:17:31tburkesetmessageid: <1537031851.88.0.956365154283.issue34698@psf.upfronthosting.co.za>
2018-09-15 17:17:31tburkelinkissue34698 messages
2018-09-15 17:17:31tburkecreate