Author serhiy.storchaka
Recipients JBernardo, Ramchandra Apte, Rosuav, William.Schwartz, asvetlov, ezio.melotti, ned.deily, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Date 2013-08-07.17:47:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1375897658.96.0.957123107415.issue13153@psf.upfronthosting.co.za>
In-reply-to
Content
Seems as Tk stores pasted "\U000104a2" as surrogate pair "\ud801\udca2". Then it encoded in UTF-8 as "\xed\xa0\x81\xed\xb2\xa2" end passed to Python. Python converts char* to Unicode object with PyUnicode_FromString() which forbids invalid UTF-8 including encoded surrogates.

Please test proposed patch on Windows.
History
Date User Action Args
2013-08-07 17:47:39serhiy.storchakasetrecipients: + serhiy.storchaka, terry.reedy, ned.deily, ezio.melotti, roger.serwy, asvetlov, python-dev, JBernardo, Rosuav, Ramchandra Apte, William.Schwartz
2013-08-07 17:47:38serhiy.storchakasetmessageid: <1375897658.96.0.957123107415.issue13153@psf.upfronthosting.co.za>
2013-08-07 17:47:38serhiy.storchakalinkissue13153 messages
2013-08-07 17:47:38serhiy.storchakacreate