Message272250
Honestly, David, everything's a mess on this front. The authoritative document here is RFC 7230 Section 3.2.4 (https://tools.ietf.org/html/rfc7230#section-3.2.4). The last paragraph of that section reads:
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.
In the case of http.client, actually maps pretty closely to Python 3's bytes object: header field values are basically ASCII + arbitrary opaque bytes. While UTF-8 is not strictly called out as allowed, neither is it called out as forbidden.
In this case, I'd say that there's no need to be too pedantic about Latin 1 at this stage in the pipeline. Python 3 is welcome to decode using Latin 1 *after* the header block has been split, because at least then it can be fixed up due to the round-tripping nature of Latin 1. But doing it here seems to just confuse the email parser. |
|
Date |
User |
Action |
Args |
2016-08-09 13:49:52 | Lukasa | set | recipients:
+ Lukasa, r.david.murray |
2016-08-09 13:49:52 | Lukasa | set | messageid: <1470750592.09.0.993035418111.issue27716@psf.upfronthosting.co.za> |
2016-08-09 13:49:52 | Lukasa | link | issue27716 messages |
2016-08-09 13:49:51 | Lukasa | create | |
|