Author epaine
Recipients Ben Griffin, epaine, ned.deily, ronaldoussoren
Date 2020-07-06.09:04:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1594026248.34.0.493004127043.issue41212@roundup.psfhosted.org>
In-reply-to
Content
Sorry, the point I was trying to make was that, unlike UTF-8, Tcl doesn't support variable length characters and they are instead fixed at 16 bits (by default). So, while Python and UTF-8 are perfectly happy with the emoji, unless Tcl is compiled with a particular build flag it will not process the character correctly (hence why I said it was surprising that Chip showed at all). I have tested on Tcl 8.6.10 and encountered the same problem described.

A further quote (granted, also old, but I cannot find anything to suggest this behaviour has been changed):
"Tcl can (currently) only represent characters within the Basic Multilingual Plane of Unicode, so there's no way that you can even feed an U+10000 into encoding convertto :-(. Fixing that is non-trivial, since some parts of Tcl (the C library) require a representation of strings where all characters take up the same number of bytes. It is possible to compile Tcl with that "number of bytes" set to 4 (meaning 32 bits per character), but it's rather wasteful, and has been reported not entirely compatible with Tk." [https://wiki.tcl-lang.org/page/utf-8]

If I can find the build flag mentioned, I will post it here for future reference.
History
Date User Action Args
2020-07-06 09:04:08epainesetrecipients: + epaine, ronaldoussoren, ned.deily, Ben Griffin
2020-07-06 09:04:08epainesetmessageid: <1594026248.34.0.493004127043.issue41212@roundup.psfhosted.org>
2020-07-06 09:04:08epainelinkissue41212 messages
2020-07-06 09:04:08epainecreate