This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author asvetlov
Recipients asvetlov, ezio.melotti, loewis, ned.deily, python-dev, roger.serwy, terry.reedy, vbr, vstinner
Date 2012-03-25.20:07:09
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1332706030.95.0.200107208467.issue14200@psf.upfronthosting.co.za>
In-reply-to
Content
After experiments with non-BMP characters I figured out:
— non-bmp symbols processed by Tk text widgets (Entry, Text etc.) differently. For example Entry can display non-bmp with spaces after glyph, Text reduces symbol to BMP. Editing is also weird.
— looks like tk event loop passes input of non-bmp directly to tkinter as is.

Obviously Tk does not support non-BMP chars by spec while not rejects ones strictly. Details are implementation specific and depends not only from Tcl/Tk version but from concrete widget class. 

After that my position is: 
— implement utf8-bmp codec
— first implementation of utf8-bmp can be done with pure python using utf-8 codec and checks. This way is simple enough while has potential performance degradation. Doesn't matter if codec will be used only for converting relative short strings in Tk widgets.
— use it in _tkinter AsObj/FromObj functions with 'replace' mode.
— my approach is a bit incompatible in dark corner matter of non-BMP chars (not supported but silently passed to low-level platform API with weird transitions on the way). I think this is not a problem at all. 
— with utf-8-bmp codec IDLE still can use 'strict' mode in .write function (`print` and displayhook I mean) to keep current behavior or use escaping for displayhook and 'replace' for regular `print`. In implementation of #14326 we can use directly specified encoding for `print` as well.

I experimented with Ubuntu box but pretty sure — the same result can be reproduced on OS X and Windows as well. Also we need to make Tk to be crossplatform — so replacing non-BMP is not bad but it is good solution until Tcl/Tk will process non-bmp in native manner.
History
Date User Action Args
2012-03-25 20:07:11asvetlovsetrecipients: + asvetlov, loewis, terry.reedy, vstinner, vbr, ned.deily, ezio.melotti, roger.serwy, python-dev
2012-03-25 20:07:10asvetlovsetmessageid: <1332706030.95.0.200107208467.issue14200@psf.upfronthosting.co.za>
2012-03-25 20:07:10asvetlovlinkissue14200 messages
2012-03-25 20:07:09asvetlovcreate