This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ronaldoussoren
Recipients Pixmew, ned.deily, ronaldoussoren, serhiy.storchaka
Date 2020-11-13.17:04:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1605287067.48.0.407397194607.issue42318@roundup.psfhosted.org>
In-reply-to
Content
BTW. The unicodeFromTclStringAndSize() basically undoes the special treatment of \0 in Modified UTF-8 [1]. That page says that all known implementation of MUTF-8 treat surrogate pairs the same as CESU-8 [2], which is UTF-8 with characters outside of the BMP encoded as surrogate pairs which are then converted to UTF-8.

Neither encoding is currently supported by Python.

[1] https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
[2] https://en.wikipedia.org/wiki/CESU-8
History
Date User Action Args
2020-11-13 17:04:27ronaldoussorensetrecipients: + ronaldoussoren, ned.deily, serhiy.storchaka, Pixmew
2020-11-13 17:04:27ronaldoussorensetmessageid: <1605287067.48.0.407397194607.issue42318@roundup.psfhosted.org>
2020-11-13 17:04:27ronaldoussorenlinkissue42318 messages
2020-11-13 17:04:27ronaldoussorencreate