classification
Title: urllib2.build_opener() skips ProxyHandler
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 2.7, Python 2.6, Python 2.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry (1)
Priority: high Keywords

Created on 2009-10-16 19:06 by barry, last changed 2009-10-16 21:06 by barry.

Messages (2)
msg94144 - (view) Author: Barry A. Warsaw (barry) Date: 2009-10-16 19:06
Try this:

>>> from urllib2 import build_opener
>>> build_opener().handlers

In Python 2.4, you will see ProxyHandler as the first handler, but this
handler is missing from the list in Python 2.5, 2.6, and 2.7, despite this
text in the documentation:

    urllib2.build_opener([handler, ...])

    Return an OpenerDirector instance, which chains the handlers in the
order
    given. handlers can be either instances of BaseHandler, or subclasses of
    BaseHandler (in which case it must be possible to call the constructor
    without any parameters). Instances of the following classes will be in
    front of the handlers, unless the handlers contain them, instances
of them
    or subclasses of them: ProxyHandler, UnknownHandler, HTTPHandler,
    HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler,
    HTTPErrorProcessor.

In fact, there is no way to add a ProxyHandler at all using the public API.
This is because the following code was added to Python 2.5, purportedly as a
fix for bug 972322:

    http://bugs.python.org/issue972322

# urllib2.py:307

            if meth in ["redirect_request", "do_open", "proxy_open"]:
                # oops, coincidental match
                continue

Because of this, the following are not a workarounds:

>>> opener.add_handler(ProxyHandler)
>>> build_opener(ProxyHandler())

In fact, as near as I can tell, the only way to get a ProxyHandler in
there is
to do an end-run around .add_handler():

>>> proxy_handler = ProxyHandler()
>>> opener.handlers.insert(0, proxy_handler)
>>> proxy_handler.add_parent(opener)

I'm actually quite shocked this has never been reported before.

ISTM that the right fix is what was originally suggested in bug 972322:

    http://bugs.python.org/msg46172

"The alternative would be to rename do_open and proxy_open, and leave the
redirect_request case unchanged (see below for why)."

The intent of this patch could not have been to completely prevent
ProxyHandler from being included in the list of handlers, otherwise why keep
ProxyHandler at all?  If that was the case, then the documentation for
urllib2
is broken, and it should have described this change as occurring in Python
2.5.
msg94150 - (view) Author: Barry A. Warsaw (barry) Date: 2009-10-16 21:06
This may end up being just a documentation issue.  If the environment
has http_proxy set, you do get a ProxyHandler automatically.

>>> import os
>>> os.environ['http_proxy'] = 'localhost'
>>> from urllib2 import build_opener
>>> build_opener().handlers
[<urllib2.ProxyHandler instance at 0x7fb664ec6e18>,
<urllib2.UnknownHandler instance at 0x7fb664eca050>,
<urllib2.HTTPHandler instance at 0x7fb664eca710>,
<urllib2.HTTPDefaultErrorHandler instance at 0x7fb664ecaa70>,
<urllib2.HTTPRedirectHandler instance at 0x7fb664ecad88>,
<urllib2.FTPHandler instance at 0x7fb664ecae60>, <urllib2.FileHandler
instance at 0x7fb664ecaf38>, <urllib2.HTTPSHandler instance at
0x7fb664ece3b0>, <urllib2.HTTPErrorProcessor instance at 0x7fb664ece128>]
History
Date User Action Args
2009-10-16 21:06:23barrysetmessages: + msg94150
2009-10-16 19:06:42barrycreate