Message 115794 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Dmitry.Jemerov, pitrou, r.david.murray
Date	2010-09-07.18:59:18
SpamBayes Score	6.2808786e-10
Marked as misclassified	No
Message-id	<1283885960.7.0.291028602153.issue9360@psf.upfronthosting.co.za>
In-reply-to

Content
Note that according to RFC 3977, “The character set for all NNTP commands is UTF-8”. But it also says this about multi-line data blocks: Note that texts using an encoding (such as UTF-16 or UTF-32) that may contain the octets NUL, LF, or CR other than a CRLF pair cannot be reliably conveyed in the above format (that is, they violate the MUST requirement above). However, except when stated otherwise, this specification does not require the content to be UTF-8, and therefore (subject to that same requirement) it MAY include octets above and below 128 mixed arbitrarily. IMO, it should decode/encode by default using utf-8 (with the "surrogateescape" error handler for easy round-tripping with non-compliant servers), except for raw articles (bodies / envelopes) where bytes should be returned.

Note that according to RFC 3977, “The character set for all NNTP commands is UTF-8”.

But it also says this about multi-line data blocks:

   Note that texts using an encoding (such as UTF-16 or UTF-32) that may
   contain the octets NUL, LF, or CR other than a CRLF pair cannot be
   reliably conveyed in the above format (that is, they violate the MUST
   requirement above).  However, except when stated otherwise, this
   specification does not require the content to be UTF-8, and therefore
   (subject to that same requirement) it MAY include octets above and
   below 128 mixed arbitrarily.

IMO, it should decode/encode by default using utf-8 (with the "surrogateescape" error handler for easy round-tripping with non-compliant servers), except for raw articles (bodies / envelopes) where bytes should be returned.

History
Date	User	Action	Args
2010-09-07 18:59:20	pitrou	set	recipients: + pitrou, r.david.murray, Dmitry.Jemerov
2010-09-07 18:59:20	pitrou	set	messageid: <1283885960.7.0.291028602153.issue9360@psf.upfronthosting.co.za>
2010-09-07 18:59:19	pitrou	link	issue9360 messages
2010-09-07 18:59:18	pitrou	create