Message169283
> It's Unicode that considers unpaired surrogates invalid, not UTF-8 by itself.
It's UTF-8 too. See RFC 3629:
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters. When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers, which
are then encoded in UTF-8 as described above. |
|
Date |
User |
Action |
Args |
2012-08-28 13:58:29 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, rhettinger, belopolsky, pitrou, vstinner, ezio.melotti, merrellb, Brian.Merrell, petri.lehtinen, tchrist |
2012-08-28 13:58:29 | serhiy.storchaka | set | messageid: <1346162309.53.0.965792304651.issue11489@psf.upfronthosting.co.za> |
2012-08-28 13:58:29 | serhiy.storchaka | link | issue11489 messages |
2012-08-28 13:58:28 | serhiy.storchaka | create | |
|