Message97071
David: I think it's a little bit more complicated. RFC 2616 says that
the value of a header is *TEXT, which is defined as
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than
ISO-8859-1 only when encoded according to the rules of RFC 2047
So I think send_header should change in the following way:
a) if isinstance(value, bytes): send value as-is
b) if value can be encoded in latin-1: encode in latin-1, then send as-is
c) otherwise: MIME-encode as UTF-8, using the following algorithm
1. count the number of non-ascii characters, by encoding with
ascii, ignore, and comparing result lengths
2. if there are less than 10% non-ascii character, use the Q encoding
3. otherwise, use the B encoding
The purpose of the algorithm in c) would be that text containing a few
non-latin characters still comes out right even if the receiver fails to
decode the header.
The same change would also apply to the client-side of sending headers.
On the receiving side, we should offer an option to decode headers (both
for client and server); this should be an option because senders may not
comply with RFC 2616. Reading should then proceed as follows:
1. check whether there are MIME markers in the text
2. if so, MIME-decode
3. if not, decode as latin-1 |
|
Date |
User |
Action |
Args |
2009-12-30 23:49:25 | loewis | set | recipients:
+ loewis, pitrou, r.david.murray |
2009-12-30 23:49:25 | loewis | set | messageid: <1262216965.73.0.255021516016.issue7606@psf.upfronthosting.co.za> |
2009-12-30 23:49:24 | loewis | link | issue7606 messages |
2009-12-30 23:49:23 | loewis | create | |
|