This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.request.Request.set_proxy doesn't (necessarily) replace type
Type: Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: orsenthil, tburke
Priority: normal Keywords:

Created on 2018-09-15 17:17 by tburke, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg325449 - (view) Author: Tim Burke (tburke) * Date: 2018-09-15 17:17
Not sure if this is a documentation or behavior bug, but... the docs for urllib.request.Request.set_proxy (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.set_proxy) say

> Prepare the request by connecting to a proxy server. *The host and type will replace those of the instance*, and the instance’s selector will be the original URL given in the constructor.

(Emphasis mine.) In practice, behavior is more nuanced than that:

>>> from urllib.request import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.host, req.type # So far, so good...
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.host, req.type
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.host, req.type # Type doesn't change!
('proxy:other-port', 'https')

Looking at the source (https://github.com/python/cpython/blob/v3.7.0/Lib/urllib/request.py#L397) it's obvious why https is treated specially.

The behavior is consistent with how things worked on py2...

>>> from urllib2 import Request
>>> req = Request('http://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'http')
>>> req.set_proxy('proxy:other-port', 'https')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')
>>>
>>> req = Request('https://hostame:port/some/path')
>>> req.get_host(), req.get_type()
('hostame:port', 'https')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'https')


... but only if you're actually inspecting host/type along the way!

>>> from urllib2 import Request
>>> req = Request('https://hostame:port/some/path')
>>> req.set_proxy('proxy:other-port', 'http')
>>> req.get_host(), req.get_type()
('proxy:other-port', 'http')

(FWIW, this came up while porting an application from py2 to py3; there was a unit test expecting that last behavior of proxying a https connection through a http proxy.)
History
Date User Action Args
2022-04-11 14:59:06adminsetgithub: 78879
2019-04-10 12:38:01cheryl.sabellasetnosy: + orsenthil

versions: + Python 3.8
2018-09-15 17:17:31tburkecreate