This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients asvetlov, ezio.melotti, loewis, pitrou, roger.serwy, serhiy.storchaka, vstinner
Date 2012-04-16.14:56:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1334588207.85.0.0319557013719.issue14304@psf.upfronthosting.co.za>
In-reply-to
Content
Example:

>>> '\u0100'
'Ā'
>>> '\u0100\U00010000'
'\u0100\U00010000'
>>> print('\u0100')
Ā
>>> print('\u0100\U00010000')
Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    print('\u0100\U00010000')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk

But I think that it is too specific problem and too specific solution. It would be better if IDLE itself escapes the string in the most appropriate way.

def utf8bmp_encode(s):
    return ''.join(c if ord(c) <= 0xffff else '\\U%08x' % ord(c) for c in s).encode('utf-8')

or

def utf8bmp_encode(s):
    return re.sub('[^\x00-\uffff]', lambda m: '\\U%08x' % ord(m.group()), s).encode('utf-8')
History
Date User Action Args
2012-04-16 14:56:47serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, pitrou, vstinner, ezio.melotti, roger.serwy, asvetlov
2012-04-16 14:56:47serhiy.storchakasetmessageid: <1334588207.85.0.0319557013719.issue14304@psf.upfronthosting.co.za>
2012-04-16 14:56:47serhiy.storchakalinkissue14304 messages
2012-04-16 14:56:47serhiy.storchakacreate