This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Brian.Merrell, belopolsky, ezio.melotti, merrellb, petri.lehtinen, pitrou, rhettinger, serhiy.storchaka, tchrist, vstinner
Date 2012-08-28.13:58:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za>
In-reply-to
Content
> It's Unicode that considers unpaired surrogates invalid, not UTF-8 by itself.

It's UTF-8 too. See RFC 3629:

   The definition of UTF-8 prohibits encoding character numbers between
   U+D800 and U+DFFF, which are reserved for use with the UTF-16
   encoding form (as surrogate pairs) and do not directly represent
   characters.  When encoding in UTF-8 from UTF-16 data, it is necessary
   to first decode the UTF-16 data to obtain character numbers, which
   are then encoded in UTF-8 as described above.
History
Date User Action Args
2012-08-28 13:58:29serhiy.storchakasetrecipients: + serhiy.storchaka, rhettinger, belopolsky, pitrou, vstinner, ezio.melotti, merrellb, Brian.Merrell, petri.lehtinen, tchrist
2012-08-28 13:58:29serhiy.storchakasetmessageid: <1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za>
2012-08-28 13:58:29serhiy.storchakalinkissue11489 messages
2012-08-28 13:58:28serhiy.storchakacreate