Author terry.reedy
Recipients ezio.melotti, ned.deily, serhiy.storchaka, terry.reedy
Date 2018-06-10.21:10:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1528665036.04.0.592728768989.issue13153@psf.upfronthosting.co.za>
In-reply-to
Content
AFAIK, the big new feature of tcl/tk 9.0 is intended to be full unicode support.  We can hope that 9.0 appears in time to be included in the 3.8 installers.

Until then, I think filenames, user program output, and clipboard content should be checked for the presence of astral characters before being sent to a tk widget. For this issue, that means replacing the built-in <<Paste>> handler.  Replace astral chars with \U000nnnn escapes.  If the widget it a Text, tag the escape as 'Astral' and color it with the code context colors to distinguish it from escapes originally in the string.

Strings know their kind, but a request to expose that has been rejected.  Pyshell currently compares the max codepoint to 'ffff'.  But it appears that we can detect kind with an O(1) expression.  For 3.6 and 3.7, "sys.getsizeof(s) == 76 + len(s)".  For 3.8, "sys.getsizeof(s) == 48 + len(s)".  Does anyone know why the difference?
History
Date User Action Args
2018-06-10 21:10:36terry.reedysetrecipients: + terry.reedy, ned.deily, ezio.melotti, serhiy.storchaka
2018-06-10 21:10:36terry.reedysetmessageid: <1528665036.04.0.592728768989.issue13153@psf.upfronthosting.co.za>
2018-06-10 21:10:36terry.reedylinkissue13153 messages
2018-06-10 21:10:35terry.reedycreate