Message380910
BTW. The unicodeFromTclStringAndSize() basically undoes the special treatment of \0 in Modified UTF-8 [1]. That page says that all known implementation of MUTF-8 treat surrogate pairs the same as CESU-8 [2], which is UTF-8 with characters outside of the BMP encoded as surrogate pairs which are then converted to UTF-8.
Neither encoding is currently supported by Python.
[1] https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
[2] https://en.wikipedia.org/wiki/CESU-8 |
|
Date |
User |
Action |
Args |
2020-11-13 17:04:27 | ronaldoussoren | set | recipients:
+ ronaldoussoren, ned.deily, serhiy.storchaka, Pixmew |
2020-11-13 17:04:27 | ronaldoussoren | set | messageid: <1605287067.48.0.407397194607.issue42318@roundup.psfhosted.org> |
2020-11-13 17:04:27 | ronaldoussoren | link | issue42318 messages |
2020-11-13 17:04:27 | ronaldoussoren | create | |
|