This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author demian.brecht
Recipients Bob.Chen, demian.brecht, orsenthil, vstinner
Date 2015-01-02.23:23:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
A few notes:

1. Unicode hosts are not automatically IDNA-encoded (which they /could/ be rather than relying on the programmer to be aware of this), but this really has no bearing on this specific issue
2. Unicode paths are not automatically IRI-encoded (see, which should also likely be automatically handled when unicode objects are encountered as the path
3. When a single unicode element is contained within a list, string_join will defer to PyUnicode_Join.

The problem here is that your pre-joined request elements looks like this: [u'POST HTTP/1.1', 'Host:', 'Accept-Encoding: identity', 'Content-Length: 44', 'notes: \xe5\x91\xb5\xe5\x91\xb5', 'Content-type: application/x-www-form-urlencoded', 'Accept: text/plain', '', '']

Because there's a unicode object contained in the list at index 0, the entire list is converted to unicode, which results in the error when \xe5 is encountered by the ascii decoder.

The proposed solution won't work as unicode characters are legal (see RFC 3987) and will fail should anything outside of the ascii character set be present.

I think that the correct way to solve this issue is to automatically encode unicode paths (or IRIs) using urllib.quote, passing the reserved characters defined in RFC 3987 as the safe parameter:

>>> urllib.quote(u'/foo/呵/bar'.encode('utf-8'),':/?#[]@!$&\'()*+,;=')
Date User Action Args
2015-01-02 23:23:09demian.brechtsetrecipients: + demian.brecht, orsenthil, vstinner, Bob.Chen
2015-01-02 23:23:08demian.brechtsetmessageid: <>
2015-01-02 23:23:08demian.brechtlinkissue22231 messages
2015-01-02 23:23:07demian.brechtcreate