Author loewis
Recipients loewis, pitrou, r.david.murray
Date 2009-12-30.23:49:23
SpamBayes Score 5.54066e-09
Marked as misclassified No
Message-id <1262216965.73.0.255021516016.issue7606@psf.upfronthosting.co.za>
In-reply-to
Content
David: I think it's a little bit more complicated. RFC 2616 says that
the value of a header is *TEXT, which is defined as

   The TEXT rule is only used for descriptive field contents and values 
   that are not intended to be interpreted by the message parser. Words 
   of *TEXT MAY contain characters from character sets other than 
   ISO-8859-1 only when encoded according to the rules of RFC 2047

So I think send_header should change in the following way:

a) if isinstance(value, bytes): send value as-is
b) if value can be encoded in latin-1: encode in latin-1, then send as-is
c) otherwise: MIME-encode as UTF-8, using the following algorithm
   1. count the number of non-ascii characters, by encoding with
      ascii, ignore, and comparing result lengths
   2. if there are less than 10% non-ascii character, use the Q encoding
   3. otherwise, use the B encoding

The purpose of the algorithm in c) would be that text containing a few
non-latin characters still comes out right even if the receiver fails to
decode the header.

The same change would also apply to the client-side of sending headers.
On the receiving side, we should offer an option to decode headers (both
for client and server); this should be an option because senders may not
comply with RFC 2616. Reading should then proceed as follows:
1. check whether there are MIME markers in the text
2. if so, MIME-decode
3. if not, decode as latin-1
History
Date User Action Args
2009-12-30 23:49:25loewissetrecipients: + loewis, pitrou, r.david.murray
2009-12-30 23:49:25loewissetmessageid: <1262216965.73.0.255021516016.issue7606@psf.upfronthosting.co.za>
2009-12-30 23:49:24loewislinkissue7606 messages
2009-12-30 23:49:23loewiscreate