This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients ezio.melotti, loewis, petri.lehtinen, pitrou
Date 2011-11-03.17:28:58
SpamBayes Score 2.4115716e-06
Marked as misclassified No
Message-id <1320341339.76.0.186711693523.issue13333@psf.upfronthosting.co.za>
In-reply-to
Content
RFC 2152 talks about encoding 16-bit unicode, and clarifies

 Surrogate pairs (UTF-16) are converted by treating each half 
 of the pair as a separate 16 bit quantity (i.e., no special
 treatment).

So lone surrogates clearly should be supported.

This text could be interpreted as saying that decoding surrogate pairs should also keep them (rather than combining them). However, the RFC also assumes that the decoded form will use 16-bit code units; for Python, I think we should continue combining surrogate pairs on decoding UTF-7 when we find them.
History
Date User Action Args
2011-11-03 17:28:59loewissetrecipients: + loewis, pitrou, ezio.melotti, petri.lehtinen
2011-11-03 17:28:59loewissetmessageid: <1320341339.76.0.186711693523.issue13333@psf.upfronthosting.co.za>
2011-11-03 17:28:59loewislinkissue13333 messages
2011-11-03 17:28:58loewiscreate