This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Lukasa
Recipients Lukasa, r.david.murray
Date 2016-08-09.13:49:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1470750592.09.0.993035418111.issue27716@psf.upfronthosting.co.za>
In-reply-to
Content
Honestly, David, everything's a mess on this front. The authoritative document here is RFC 7230 Section 3.2.4 (https://tools.ietf.org/html/rfc7230#section-3.2.4). The last paragraph of that section reads:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

In the case of http.client, actually maps pretty closely to Python 3's bytes object: header field values are basically ASCII + arbitrary opaque bytes. While UTF-8 is not strictly called out as allowed, neither is it called out as forbidden.

In this case, I'd say that there's no need to be too pedantic about Latin 1 at this stage in the pipeline. Python 3 is welcome to decode using Latin 1 *after* the header block has been split, because at least then it can be fixed up due to the round-tripping nature of Latin 1. But doing it here seems to just confuse the email parser.
History
Date User Action Args
2016-08-09 13:49:52Lukasasetrecipients: + Lukasa, r.david.murray
2016-08-09 13:49:52Lukasasetmessageid: <1470750592.09.0.993035418111.issue27716@psf.upfronthosting.co.za>
2016-08-09 13:49:52Lukasalinkissue27716 messages
2016-08-09 13:49:51Lukasacreate