Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib URLopener().open https url returns 501 Not Implemented when https_proxy env var is http:// #68787

Closed
stefano-m mannequin opened this issue Jul 9, 2015 · 10 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@stefano-m
Copy link
Mannequin

stefano-m mannequin commented Jul 9, 2015

BPO 24599
Nosy @bitdancer, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-07-27.16:36:21.105>
created_at = <Date 2015-07-09.18:14:15.937>
labels = ['type-bug', 'library']
title = 'urllib URLopener().open  https url returns 501 Not Implemented when https_proxy env var is http://'
updated_at = <Date 2020-07-27.16:36:21.104>
user = 'https://bugs.python.org/stefano-m'

bugs.python.org fields:

activity = <Date 2020-07-27.16:36:21.104>
actor = 'stefano-m'
assignee = 'none'
closed = True
closed_date = <Date 2020-07-27.16:36:21.105>
closer = 'stefano-m'
components = ['Library (Lib)']
creation = <Date 2015-07-09.18:14:15.937>
creator = 'stefano-m'
dependencies = []
files = []
hgrepos = []
issue_num = 24599
keywords = []
message_count = 10.0
messages = ['246515', '246517', '246519', '246535', '246554', '246556', '246557', '246563', '247429', '374401']
nosy_count = 4.0
nosy_names = ['r.david.murray', 'martin.panter', 'l', 'stefano-m']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24599'
versions = ['Python 2.7', 'Python 3.4']

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 9, 2015

Hello,

at work, I am behind a proxy (squid) that is only available over http. So, I have to configure both the http_proxy and https_proxy environment variables to be something like "http://proxy.corp.com:8181"

Now, when I try and use urllib to open an "https" url, the proxy returns "501 Not Implemented".

The quick and dirty script below is a minimal demonstration of the problem that appears on both python 2.7 (tested on 2.7.6, 2.7.9, 2.7.10) and 3.4 (tested on 3.4.0 and 3.4.4)

try:
import urllib
opener = urllib.URLopener()
except AttributeError:
# Python 3
import urllib.request
opener = urllib.request.URLopener()

url = 'https://www.python.org'

print("Trying to open", url)

opener.open(url)

Changing the url to "http://" works OK.

Thanks,

-- Stefano

@stefano-m stefano-m mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 9, 2015
@bitdancer
Copy link
Member

It is not clear from your description if you actually tested it with python3. In python2, I believe urllib does not support this (see bpo-1424152) while urllib2 does.

Assuming you have, I wonder if the not implemented error is your squid saying it doesn't support CONNECT (which would be a bit surprising, granted). Have you looked at the error text or the squid logs?

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 9, 2015

I have run the minimal example provided on both Python2 and Python3 with the same results. Sorry if that was not clear.

I did look at bpo-1424152 but it seemed to me that I was experiencing a different problem.

When I try and open the page, I get a squid error page with a somewhat vague error saying that "the method is not supported" (even though it's a simple GET and I can get the same page with other tools like wget or a web browser). Unfortunately, I don't have access to the proxied environment right now, and I will need to ask for the squid logs anyway since I can't access them.

I have to say that I have experienced this problem while using buildout as zc.buildout.download uses urllib.urlretrieve. Surprisingly, it succeeds on Python3, but it fails with Python2 which is our supported version (so there's currently no way that I can use Python3 at work).

@vadmium
Copy link
Member

vadmium commented Jul 10, 2015

David: the original patch made in bpo-1424152 fixed Python 2’s urllib.request.urlopen() and Python 2’s urllib2.urlopen(). But Stefano is using URLopener, which I understand comes from Python 2’s older “urllib” module.

When I run the demonstration, the request to the proxy looks like this:

GET https://www.python.org HTTP/1.1
Host: www.python.org
Accept-Encoding: identity
Connection: close
User-Agent: Python-urllib/3.4

I think Stefano requires a “CONNECT www.python.org:443” request instead. There is apparently a patch which sounds like it does this. See bpo-1424152-py27-urllib.diff and the messages beginning with <https://bugs.python.org/issue1424152#msg194704\>. I suggest any further work (e.g. tests and documentation) continue here, since the other issue has been closed and mainly discusses “urllib2”.

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 10, 2015

Martin, thanks for elaborating my thoughts!

I have dug I bit deeper in Python2's urllib code with pdb, and I think I have narrowed the issue down to what open_http does.

In my example code, replacing opener.open(url) with opener.open_http(url) gives the same problem.

I realize I did not provide you with the output of the script, so here it is:

  • Python 2.7.10
python urllib_error.py
('Trying to open', 'https://www.python.org')
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 364, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 381, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/2.7.10/lib/python2.7/urllib.py", line 386, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 501, 'Not Implemented', <httplib.HTTPMessage instance at 0x7f875a67b950>)
  • Python 3.4.3
python urllib_error.py
Trying to open https://www.python.org
Traceback (most recent call last):
  File "urllib_error.py", line 30, in <module>
    opener.open_http((host, selector))
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1805, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1801, in _open_generic_http
    response.status, response.reason, response.msg, data)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1821, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/home/mazzucco/.pyenv/versions/3.4.3/lib/python3.4/urllib/request.py", line 1826, in http_error_default
    raise HTTPError(url, errcode, errmsg, headers, None)
urllib.error.HTTPError: HTTP Error 501: Not Implemented

When I unwrap the contents of httplib.HTTPMessage, the error page returned by the squid proxy says:

-------------------------------------------------------
ERROR
The requested URL could not be retrieved

The following error was encountered while trying to retrieve the URL: https://www.python.org

Unsupported Request Method and Protocol

Squid does not support all request methods for all access protocols. For example, you can not POST a Gopher request.
-------------------------------------------------------

Looking at Python2's implementation of URLopener's open_http, I can get an even more minimal failing example limited to httplib:

import httplib

host = 'proxy.corp.com:8181'  # this is not the actual proxy

selector = 'https://www.python.org'

print("Trying to open", selector)

h = httplib.HTTP(host)
h.putrequest('GET', selector)
h.putheader('User-Agent', 'Python-urllib/1.17')
h.endheaders(None)
errcode, errmsg, headers = h.getreply()

print(errcode, errmsg)
print(headers.items())

Running the script on Python 2.7.10 prints:

('Trying to open', 'https://www.python.org')
(501, 'Not Implemented')
[('content-length', '3069'), ('via', '1.0 proxy.corp.com (squid/3.1.6)'), ('x-cache', 'MISS from proxy.corp.com'), ('content-language', 'en'), ('x-squid-error', 'ERR_UNSUP_REQ 0'), ('x-cache-lookup', 'NONE from proxy.corp.com:8181'), ('vary', 'Accept-Language'), ('server', 'squid/3.1.6'), ('proxy-connection', 'close'), ('date', 'Fri, 10 Jul 2015 09:27:14 GMT'), ('content-type', 'text/html'), ('mime-version', '1.0')]

As I said, I found out about this when using buildout to download files over HTTPS.

Buildout uses urllib.urlretrieve on Python2 and urllib.request.urlretrieve on Python3. I guess that the latter has been fixed in bpo-1424152, so that's why I can download with buildout on Python3.

@vadmium
Copy link
Member

vadmium commented Jul 10, 2015

Perhaps you might be able to test out the patch <https://bugs.python.org/file31201\> to see if that fixes your problem? Though there is a good chance the patch needs updating, since it is fairly old.

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 10, 2015

Martin,

I have applied the patch <https://bugs.python.org/file31201\> to my Python2.7.10 installation and seem to work OK.

@l
Copy link
Mannequin

l mannequin commented Jul 10, 2015

Thank you Martin for referencing my patch. It still applies cleanly with --fuzz=0 to 2.7.10. Would be awesome if this fix would finally get merged.

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 26, 2015

Any thoughts from the core Python developers?

It seems to me that this is a confirmed bug with a working fix (that may need further review, but indeed works). So, hopefully, this issue could be resolved fairly quickly.

@stefano-m
Copy link
Mannequin Author

stefano-m mannequin commented Jul 27, 2020

Closing as this bug refers to versions of Python that have been EOL'd.

@stefano-m stefano-m mannequin closed this as completed Jul 27, 2020
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants