This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Flawed handling of URL redirect
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jeffrey.Kintscher, bugburger, orsenthil
Priority: normal Keywords:

Created on 2019-03-19 19:48 by bugburger, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg338404 - (view) Author: Brrr Grrr (bugburger) Date: 2019-03-19 19:48
I'm unable to use `urlopen` to open 'https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract' with Python 3.7. I believe this to be flawed URL redirection, possibly due to flawed URL parsing.

```python
from sys import version
version
'3.7.2 (default, Dec 25 2018, 03:50:46) \n[GCC 7.3.0]'
from urllib.request import urlopen
urlopen('https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 745, in http_error_302
    self.inf_msg + msg, headers, fp)
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
```
msg338405 - (view) Author: Brrr Grrr (bugburger) Date: 2019-03-19 19:54
Please note that the `requests` package, for example, has no trouble reading this URL. I don't want to use that package for this task for certain other reasons though.

```python
>>> import requests
>>> requests.__version__
'2.21.0'
>>> requests.get('https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract')
<Response [200]>
```
msg338408 - (view) Author: Brrr Grrr (bugburger) Date: 2019-03-19 19:58
This error is not due to cookies either. I tried `HTTPCookieProcessor` with no luck. Cookies help with opening certain other URLs but evidently not with this one.
msg338409 - (view) Author: Brrr Grrr (bugburger) Date: 2019-03-19 20:04
That's not to say that cookies are not needed for this URL. They may very well be needed using HTTPCookieProcessor. I'm saying that cookies alone won't solve this issue.
msg338423 - (view) Author: Brrr Grrr (bugburger) Date: 2019-03-19 23:52
I now used a custom HTTPRedirectHandler with `max_redirections = 20`. The default is 10. This workaround addresses the issue, although it doesn't rule out a cleaner fix.
History
Date User Action Args
2022-04-11 14:59:12adminsetgithub: 80553
2019-05-22 09:47:27Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-03-19 23:52:04bugburgersetmessages: + msg338423
2019-03-19 20:04:12bugburgersetmessages: + msg338409
2019-03-19 19:58:13bugburgersetmessages: + msg338408
2019-03-19 19:54:44bugburgersetmessages: + msg338405
2019-03-19 19:52:35SilentGhostsetnosy: + orsenthil
type: crash -> behavior
components: + Library (Lib), - IO
2019-03-19 19:48:59bugburgercreate