This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stefano-m
Recipients martin.panter, r.david.murray, stefano-m
Date 2015-07-10.10:13:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1436523235.26.0.761301452598.issue24599@psf.upfronthosting.co.za>
In-reply-to
Content
Martin, thanks for elaborating my thoughts!

I have dug I bit deeper in Python2's urllib code with pdb, and I think I have narrowed the issue down to what open_http does.

In my example code, replacing opener.open(url) with opener.open_http(url) gives the same problem.

I realize I did not provide you with the output of the script, so here it is:

* Python 2.7.10

python urllib_error.py
('Trying to open', 'https://www.python.org')
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 364, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 381, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 386, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 501, 'Not Implemented', <httplib.HTTPMessage instance at 0x7f875a67b950>)

* Python 3.4.3

python urllib_error.py
Trying to open https://www.python.org
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1805, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1801, in _open_generic_http
    response.status, response.reason, response.msg, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1821, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1826, in http_error_default
    raise HTTPError(url, errcode, errmsg, headers, None)
urllib.error.HTTPError: HTTP Error 501: Not Implemented

When I unwrap the contents of httplib.HTTPMessage, the error page returned by the squid proxy says:

-------------------------------------------------------
ERROR
The requested URL could not be retrieved

The following error was encountered while trying to retrieve the URL: https://www.python.org

    Unsupported Request Method and Protocol

Squid does not support all request methods for all access protocols. For example, you can not POST a Gopher request.
-------------------------------------------------------

Looking at Python2's implementation of URLopener's open_http, I can get an even more minimal failing example limited to httplib:


import httplib

host = 'proxy.corp.com:8181'  # this is not the actual proxy

selector = 'https://www.python.org'

print("Trying to open", selector)

h = httplib.HTTP(host)
h.putrequest('GET', selector)
h.putheader('User-Agent', 'Python-urllib/1.17')
h.endheaders(None)
errcode, errmsg, headers = h.getreply()

print(errcode, errmsg)
print(headers.items())


Running the script on Python 2.7.10 prints:

('Trying to open', 'https://www.python.org')
(501, 'Not Implemented')
[('content-length', '3069'), ('via', '1.0 proxy.corp.com (squid/3.1.6)'), ('x-cache', 'MISS from proxy.corp.com'), ('content-language', 'en'), ('x-squid-error', 'ERR_UNSUP_REQ 0'), ('x-cache-lookup', 'NONE from proxy.corp.com:8181'), ('vary', 'Accept-Language'), ('server', 'squid/3.1.6'), ('proxy-connection', 'close'), ('date', 'Fri, 10 Jul 2015 09:27:14 GMT'), ('content-type', 'text/html'), ('mime-version', '1.0')]


As I said, I found out about this when using buildout to download files over HTTPS.

Buildout uses urllib.urlretrieve on Python2 and urllib.request.urlretrieve on Python3. I guess that the latter has been fixed in issue 1424152, so that's why I can download with buildout on Python3.
History
Date User Action Args
2015-07-10 10:13:55stefano-msetrecipients: + stefano-m, r.david.murray, martin.panter
2015-07-10 10:13:55stefano-msetmessageid: <1436523235.26.0.761301452598.issue24599@psf.upfronthosting.co.za>
2015-07-10 10:13:55stefano-mlinkissue24599 messages
2015-07-10 10:13:53stefano-mcreate