Running this simple test script produces the traceback show below.
import urllib.request
page = urllib.request.urlopen('http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 493, in error
result = self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/lib/python3.4/http/client.py", line 1065, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1093, in _send_request
self.putrequest(method, url, **skips)
File "/usr/lib/python3.4/http/client.py", line 957, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)
Using curl we can see that there is a redirect to an url with a special char:
$ curl -vs "http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books" >DN1933
* Hostname was NOT found in DNS cache
* Trying 23.23.93.211...
* Connected to legacy.biblegateway.com (23.23.93.211) port 80 (#0)
> GET /versions/?vid=DN1933&action=getVersionInfo HTTP/1.1
> User-Agent: curl/7.35.0
> Host: legacy.biblegateway.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
* Server nginx/1.4.7 is not blacklisted
< Server: nginx/1.4.7
< Date: Fri, 22 Aug 2014 08:35:30 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< X-Powered-By: PHP/5.5.7
< Set-Cookie: bg_id=1b9a80d5e6d545487cfd153d6df65c4e; path=/; domain=.biblegateway.com
< Set-Cookie: a9gl=0; path=/; domain=.biblegateway.com
< Location: http://legacy.biblegateway.com/versions/Dette-er-Biblen-på-dansk-1933/
<
* Connection #0 to host legacy.biblegateway.com left intact
When the redirect-url doesn't contain special chars everything works as expected, like with this url: "http://legacy.biblegateway.com/versions/?vid=DNB1930&action=getVersionInfo#books"
|