This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients David E. Franco G., eryksun, terry.reedy
Date 2017-04-08.04:23:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491625427.57.0.233862121021.issue30019@psf.upfronthosting.co.za>
In-reply-to
Content
In Windows IDLE 3.x, you should still be able to print a surrogate transcoding, which sneaks the native UTF-16LE encoding around tkinter:

    def transurrogate(s):
        b = s.encode('utf-16le')
        return ''.join(b[i:i+2].decode('utf-16le', 'surrogatepass') 
                       for i in range(0, len(b), 2))

    def print_surrogate(*args, **kwds):
        new_args = []
        for arg in args:
            if isinstance(arg, str):
                new_args.append(transurrogate(s))
            else:
                new_args.append(arg)
        return print(*new_args, **kwds)


    >>> s = '\U0001f52b \U0001f52a'
    >>> print_surrogate(s)
    🔫 🔪

Pasting non-BMP text into IDLE fails on Windows for a similar reason. Tk naively encodes the surrogate codes in the native Windows UTF-16 text as invalid UTF-8, which I've seen refereed to as WTF-8 (Wobbly). I see the following error when I run IDLE using python.exe (i.e. with a console) and paste "🔫 🔪" into the window:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 1: invalid continuation byte

This is the second byte of the WTF-8 encoding:

    >>> transurrogate('"\U0001f52b').encode('utf-8', 'surrogatepass')
    b'"\xed\xa0\xbd\xed\xb4\xab'

Hackiness aside, I don't think it's worth supporting this just for Windows.
History
Date User Action Args
2017-04-08 04:23:47eryksunsetrecipients: + eryksun, terry.reedy, David E. Franco G.
2017-04-08 04:23:47eryksunsetmessageid: <1491625427.57.0.233862121021.issue30019@psf.upfronthosting.co.za>
2017-04-08 04:23:47eryksunlinkissue30019 messages
2017-04-08 04:23:47eryksuncreate