This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients Dmitry.Jemerov, pitrou, r.david.murray
Date 2010-09-07.18:59:18
SpamBayes Score 6.2808786e-10
Marked as misclassified No
Message-id <1283885960.7.0.291028602153.issue9360@psf.upfronthosting.co.za>
In-reply-to
Content
Note that according to RFC 3977, “The character set for all NNTP commands is UTF-8”.

But it also says this about multi-line data blocks:

   Note that texts using an encoding (such as UTF-16 or UTF-32) that may
   contain the octets NUL, LF, or CR other than a CRLF pair cannot be
   reliably conveyed in the above format (that is, they violate the MUST
   requirement above).  However, except when stated otherwise, this
   specification does not require the content to be UTF-8, and therefore
   (subject to that same requirement) it MAY include octets above and
   below 128 mixed arbitrarily.

IMO, it should decode/encode by default using utf-8 (with the "surrogateescape" error handler for easy round-tripping with non-compliant servers), except for raw articles (bodies / envelopes) where bytes should be returned.
History
Date User Action Args
2010-09-07 18:59:20pitrousetrecipients: + pitrou, r.david.murray, Dmitry.Jemerov
2010-09-07 18:59:20pitrousetmessageid: <1283885960.7.0.291028602153.issue9360@psf.upfronthosting.co.za>
2010-09-07 18:59:19pitroulinkissue9360 messages
2010-09-07 18:59:18pitroucreate