This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: encoding to ascii in http/client.py
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder: urllib.request.urlopen does not handle non-ASCII characters
View: 3991
Assigned To: Nosy List: Babe Hardy, martin.panter
Priority: normal Keywords:

Created on 2017-01-18 08:41 by Babe Hardy, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg285707 - (view) Author: Babe Hardy (Babe Hardy) Date: 2017-01-18 08:41
used urlopen('...{}'.format(v)).read() with v being a string
when v is of a non-ascii format:
>>UnicodeEncodeError: 'ascii' codec can't encode characters<<
in line 984 of putrequest

after changing
>>self._output(request.encode('ascii'))<<
into
>>self._output(request.encode('utf-8'))<<

my script worked again
msg285722 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-01-18 11:53
In general, HTTP URLs are supposed to be ASCII only. Newer protocols (e.g. RTSP which is based on HTTP) specifically allow UTF-8 encoding. But it would be wrong for Python’s HTTP library to assume UTF-8 is wanted everywhere. Especially in a domain name (e.g. in the full-URL request to a proxy), which should not be UTF-8 encoded. 

I suggest to work on handling IRIs (<https://tools.ietf.org/html/rfc3987>, basically Unicode URLs) in higher-level places like “urllib”. See Issue 3991.
History
Date User Action Args
2022-04-11 14:58:42adminsetgithub: 73491
2017-01-18 11:53:01martin.pantersetstatus: open -> closed

superseder: urllib.request.urlopen does not handle non-ASCII characters
nosy: + martin.panter
title: encoding to ascii in client.py -> encoding to ascii in http/client.py
messages: + msg285722

type: compile error -> enhancement
resolution: rejected
2017-01-18 08:41:01Babe Hardycreate