Author serhiy.storchaka
Recipients JBernardo, Ramchandra Apte, Rosuav, William.Schwartz, asvetlov, ezio.melotti, loewis, ned.deily, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Date 2014-01-05.15:13:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1388934837.13.0.747874201547.issue13153@psf.upfronthosting.co.za>
In-reply-to
Content
Yes, it is still the same issue. The root of issue is in converting strings when passed to Python-implemented callbacks. When a text is pasted in IDLE window, the callback is called (for highlighting). The callback is a command created by Tcl_CreateCommand from PythonCmd. PythonCmd is a wrapper which converts arguments (char*) to Python strings and then pass them to Python command. Arguments are encoded in "modified UTF-8" [1], i.e. the NUL character is represented as \xc0\x80, they can contains other invalid UTF-8 sequences (such as encoded surrogates). When decoding arguments to Python strings are failed, main Tcl loop is broken and IDLE silently closed.

When astral character is pasted on Windows, it first encoded to UTF-16 by Windows, then Tcl encodes every 16-bit surrogate to modified UTF-8. The result is not valid UTF-8. On X Window systems the X selection value usually is UTF-8 encoded (the type is UTF8_STRING), but can contains invalid UTF-8 sequences.

I will open separate issue to fix other bugs related to Tcl <-> Python string conversions. The last patch fixes only initial issue which is most important.

[1] http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
History
Date User Action Args
2014-01-05 15:13:57serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, terry.reedy, ned.deily, ezio.melotti, roger.serwy, asvetlov, python-dev, JBernardo, Rosuav, Ramchandra Apte, William.Schwartz
2014-01-05 15:13:57serhiy.storchakasetmessageid: <1388934837.13.0.747874201547.issue13153@psf.upfronthosting.co.za>
2014-01-05 15:13:57serhiy.storchakalinkissue13153 messages
2014-01-05 15:13:56serhiy.storchakacreate