Issue36372
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-03-19 19:48 by bugburger, last changed 2022-04-11 14:59 by admin.
Messages (5) | |||
---|---|---|---|
msg338404 - (view) | Author: Brrr Grrr (bugburger) | Date: 2019-03-19 19:48 | |
I'm unable to use `urlopen` to open 'https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract' with Python 3.7. I believe this to be flawed URL redirection, possibly due to flawed URL parsing. ```python from sys import version version '3.7.2 (default, Dec 25 2018, 03:50:46) \n[GCC 7.3.0]' from urllib.request import urlopen urlopen('https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract') Traceback (most recent call last): File "<input>", line 1, in <module> File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.7/urllib/request.py", line 563, in error result = self._call_chain(*args) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 745, in http_error_302 self.inf_msg + msg, headers, fp) urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Found ``` |
|||
msg338405 - (view) | Author: Brrr Grrr (bugburger) | Date: 2019-03-19 19:54 | |
Please note that the `requests` package, for example, has no trouble reading this URL. I don't want to use that package for this task for certain other reasons though. ```python >>> import requests >>> requests.__version__ '2.21.0' >>> requests.get('https://www.annemergmed.com/article/S0196-0644(99)70271-4/abstract') <Response [200]> ``` |
|||
msg338408 - (view) | Author: Brrr Grrr (bugburger) | Date: 2019-03-19 19:58 | |
This error is not due to cookies either. I tried `HTTPCookieProcessor` with no luck. Cookies help with opening certain other URLs but evidently not with this one. |
|||
msg338409 - (view) | Author: Brrr Grrr (bugburger) | Date: 2019-03-19 20:04 | |
That's not to say that cookies are not needed for this URL. They may very well be needed using HTTPCookieProcessor. I'm saying that cookies alone won't solve this issue. |
|||
msg338423 - (view) | Author: Brrr Grrr (bugburger) | Date: 2019-03-19 23:52 | |
I now used a custom HTTPRedirectHandler with `max_redirections = 20`. The default is 10. This workaround addresses the issue, although it doesn't rule out a cleaner fix. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:12 | admin | set | github: 80553 |
2019-05-22 09:47:27 | Jeffrey.Kintscher | set | nosy:
+ Jeffrey.Kintscher |
2019-03-19 23:52:04 | bugburger | set | messages: + msg338423 |
2019-03-19 20:04:12 | bugburger | set | messages: + msg338409 |
2019-03-19 19:58:13 | bugburger | set | messages: + msg338408 |
2019-03-19 19:54:44 | bugburger | set | messages: + msg338405 |
2019-03-19 19:52:35 | SilentGhost | set | nosy:
+ orsenthil type: crash -> behavior components: + Library (Lib), - IO |
2019-03-19 19:48:59 | bugburger | create |