This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.request.urlopen raises exception when 30X-redirect url contains non-ascii chars
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: http.client.HTTPConnection.putrequest encode error
View: 17214
Assigned To: Nosy List: martin.panter, orsenthil, tomasgroth
Priority: normal Keywords:

Created on 2014-08-22 08:58 by tomasgroth, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg225651 - (view) Author: Tomas Groth (tomasgroth) Date: 2014-08-22 08:58
Running this simple test script produces the traceback show below.

import urllib.request
page = urllib.request.urlopen('http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 461, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 493, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 676, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 455, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
    '_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1258, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.4/urllib/request.py", line 1232, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "/usr/lib/python3.4/http/client.py", line 1065, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.4/http/client.py", line 1093, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.4/http/client.py", line 957, in putrequest
    self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)


Using curl we can see that there is a redirect to an url with a special char:
$ curl -vs "http://legacy.biblegateway.com/versions/?vid=DN1933&action=getVersionInfo#books" >DN1933
* Hostname was NOT found in DNS cache
*   Trying 23.23.93.211...
* Connected to legacy.biblegateway.com (23.23.93.211) port 80 (#0)
> GET /versions/?vid=DN1933&action=getVersionInfo HTTP/1.1
> User-Agent: curl/7.35.0
> Host: legacy.biblegateway.com
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
* Server nginx/1.4.7 is not blacklisted
< Server: nginx/1.4.7
< Date: Fri, 22 Aug 2014 08:35:30 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 0
< Connection: keep-alive
< X-Powered-By: PHP/5.5.7
< Set-Cookie: bg_id=1b9a80d5e6d545487cfd153d6df65c4e; path=/; domain=.biblegateway.com
< Set-Cookie: a9gl=0; path=/; domain=.biblegateway.com
< Location: http://legacy.biblegateway.com/versions/Dette-er-Biblen-på-dansk-1933/
< 
* Connection #0 to host legacy.biblegateway.com left intact


When the redirect-url doesn't contain special chars everything works as expected, like with this url: "http://legacy.biblegateway.com/versions/?vid=DNB1930&action=getVersionInfo#books"
msg225652 - (view) Author: Tomas Groth (tomasgroth) Date: 2014-08-22 09:39
Small correction. Use this url for a working redirect instead of the one given at the end of the first comment:
http://legacy.biblegateway.com/versions/?vid=ESV&action=getVersionInfo#books
msg240472 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-11 11:58
Same as Issue 17214
History
Date User Action Args
2022-04-11 14:58:07adminsetgithub: 66444
2015-04-11 11:58:11martin.pantersetstatus: open -> closed

nosy: + martin.panter
messages: + msg240472

superseder: http.client.HTTPConnection.putrequest encode error
resolution: duplicate
2014-08-22 19:48:52ned.deilysetnosy: + orsenthil
2014-08-22 09:39:51tomasgrothsetmessages: + msg225652
2014-08-22 08:58:03tomasgrothcreate