This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vbr
Recipients ezio.melotti, loewis, ned.deily, roger.serwy, terry.reedy, vbr, vstinner
Date 2012-03-06.00:39:10
SpamBayes Score 1.110223e-16
Marked as misclassified No
Message-id <1330994351.32.0.963677093559.issue14200@psf.upfronthosting.co.za>
In-reply-to
Content
I'd like to add some further observations to the mentioned issue;
it seems, that the crash is indeed not specific to idle.
In a sample tkinter app, where I just display e.g. chr(66352) in an Entry widget, I also get the same immediate crash via pythonw.exe and the previously mentioned "proper" ValueError without a crash with python.exe.

I also tried to explicitly display surrogate pair, which were used automatically until python 3.2; these can be used in tkinter in 3.3, but there are limitations and discrepancies:

>>> 
>>> got_ahsa = "\N{GOTHIC LETTER AHSA}"
>>> def wide_char_to_surrog_pair(char):
    code_point = ord(char)
    if code_point <= 0xFFFF:
        return char
    else:
        high_surr = (code_point - 0x10000) // 0x400 + 0xD800
        low_surr = (code_point - 0x10000) % 0x400 + 0xDC00
        return chr(high_surr)+chr(low_surr)

>>> ahsa_surrog = wide_char_to_surrog_pair(got_ahsa)
>>> print(ahsa_surrog)
𐌰
>>> repr(ahsa_surrog)
"'_ud800\x00udf30'"
>>> ahsa_surrog
'Pud800 udf30'

[the space in the middle of the last item might be \x00, as it terminates the clipboard content, the rest is copied separately]

the printed square corresponds with the given character and can be used in other programs etc. (whereas in py 3.2, the same value was used for repr and a direct "display" of the string in the interpreter, there are three different formats in py 3.3.

I also noticed that surogate pair is not supported as input for unicodedata.name(...) anymore:
 
>>> import unicodedata
>>> unicodedata.name(ahsa_surrog)
Traceback (most recent call last):
  File "<pyshell#60>", line 1, in <module>
    unicodedata.name(ahsa_surrog)
TypeError: need a single Unicode character as parameter
>>> 

(in 3.2 and probably others it returns the expected 'GOTHIC LETTER AHSA')

(I for my part would think, that e.g. keeping a  bit liberal (but still non-ambiguous) input possibilities for unicodedata wouldn't hurt. Also, if tkinter is not going to support wide unicode natively any time soon, the output conversion using surrogates, which are also understandable for other programs, seems the most usable option in this regard.

Hopefully, this is somehow relevant for the original issue -
I am somehow not sure, whether some parts would be better posted as separate issues, or whether this is the planned and expected behaviour anyway.

regards,
   vbr
History
Date User Action Args
2012-03-06 00:39:11vbrsetrecipients: + vbr, loewis, terry.reedy, vstinner, ned.deily, ezio.melotti, roger.serwy
2012-03-06 00:39:11vbrsetmessageid: <1330994351.32.0.963677093559.issue14200@psf.upfronthosting.co.za>
2012-03-06 00:39:10vbrlinkissue14200 messages
2012-03-06 00:39:10vbrcreate