classification
Title: characters with ord above 65535 fail to display in IDLE
Type: behavior Stage: resolved
Components: Tkinter Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Idle shell crash on printing non-BMP unicode character
View: 14200
Assigned To: asvetlov Nosy List: Ramchandra Apte, asvetlov, eric.smith, ezio.melotti, flox, kbk, loewis, ned.deily, python-dev, r.david.murray, roger.serwy, terry.reedy, vstinner, wujek.srujek
Priority: normal Keywords: patch

Created on 2011-06-15 20:59 by wujek.srujek, last changed 2012-03-14 22:15 by roger.serwy. This issue is now closed.

Files
File name Uploaded Description Edit
tcl_unicode_range.patch vstinner, 2011-11-03 19:54
Messages (22)
msg138389 - (view) Author: wujek (wujek.srujek) Date: 2011-06-15 20:59
The following code produces an exception:

print('{:c}'.format(65536))

when executed in Idle 3.2. The stack trace:

>>> print('{:c}'.format(65536))
Traceback (most recent call last):
  File "<pyshell#149>", line 1, in <module>
    print('{:c}'.format(65536))
  File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write
    self.shell.write(s, self.tags)
  File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write
    OutputWindow.write(self, s, tags, "iomark")
  File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write
    self.text.insert(mark, s, tags)
  File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert
    self.top.insert(index, chars, tags)
  File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
    self.delegate.insert(index, chars, tags)
  File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert
    UndoDelegator.insert(self, index, chars, tags)
  File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert
    self.addcmd(InsertCommand(index, chars, tags))
  File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd
    cmd.do(self.delegate)
  File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do
    text.insert(self.index1, self.chars, self.tags)
  File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
    self.delegate.insert(index, chars, tags)
  File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__
    return self.tk_call(self.orig_and_operation + args)
ValueError: unsupported character

Seems to work fine in a terminal (Gnome-terminal in this case):

>>> print('{:c}'.format(0x10000))
𐀀

(my font doesn't have the glyph, but otherwise it works)



Python version:
>>> print(sys.version)
3.2 (r32:88445, Mar 25 2011, 19:56:22) 
[GCC 4.5.2]

Os:
wujek@home:~$ uname -a
Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

wujek@home:~$ cat /etc/issue
Ubuntu 11.04
msg138390 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-15 21:10
Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it.
msg138392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-15 21:47
U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range.
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf
msg138395 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-15 21:59
From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that.
msg138397 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-15 22:01
> From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl
> 8.5 (and earlier) does not support Unicode code points outside
> the BMP range as in this example.

Extract of http://wiki.tcl.tk/1364 :

"RS 2008-07-09: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle."

> I don't think there is anything practical IDLE
> or tkinter can do about that.

We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill.
msg138402 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-15 22:17
It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990&;

I suppose it could be slightly more informative but it seems pretty unambiguous to me.  Martin, any opinions?
msg138497 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-17 10:54
Instead of
  ValueError: unsupported character
I suggest:
  ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range

What do you think?
msg138541 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-17 18:31
>ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range

Slightly shorter and without the double :s.

ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk.

I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault.
msg146663 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-10-30 21:46
(Merging CC list from duplicate Issue13265.
msg146665 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-30 22:33
Changing the error message sounds fine to me.

People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally.
msg146965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-03 19:54
Here is the patch as a .patch file.
msg146983 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-03 21:39
I'm not sure whether the wording is good English, but apart from that, the patch looks fine.
msg146984 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-11-03 21:49
The patch implements my suggestion. Looking again, I think the English is fine ;-).
msg146987 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-11-03 22:14
You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing).
msg146991 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-03 23:42
New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2':
Issue #12342: Improve _tkinter error message on unencodable character
http://hg.python.org/cpython/rev/9a07b73abdb1

New changeset 5aea95d41ad2 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #12342: Improve _tkinter error message on unencodable character
http://hg.python.org/cpython/rev/5aea95d41ad2
msg146992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-03 23:49
_tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl").

> You could say "Unicode character ..." in the error to make clear
> what kind of range is U+0000-U+FFFF (people that are not familiar
> with Unicode and BMP chars might wonder if that's some tcl/tk thing).

I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-)
msg146994 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-11-04 00:27
Failed to build these modules: (3.3 on Snow Leopard)
_tkinter


./cpython/Modules/_tkinter.c: In function ‘AsObj’:
./cpython/Modules/_tkinter.c:996: warning: dereferencing ‘void *’ pointer
./cpython/Modules/_tkinter.c:996: error: invalid use of void expression
msg146999 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-04 08:49
New changeset 5f49b496d161 by Victor Stinner in branch 'default':
Issue #12342: Fix compilation on Mac OS X
http://hg.python.org/cpython/rev/5f49b496d161
msg154966 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-03-05 17:59
In responding to #14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form.
msg155414 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-11 22:11
I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF.
msg155804 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-14 21:48
Fixed in #14200
msg155809 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:15
Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding.
History
Date User Action Args
2012-03-14 22:15:48roger.serwysetmessages: + msg155809
2012-03-14 21:48:11asvetlovsetstatus: open -> closed

assignee: asvetlov
versions: - Python 2.7, Python 3.2
messages: + msg155804
superseder: Idle shell crash on printing non-BMP unicode character
resolution: fixed -> duplicate
stage: commit review -> resolved
2012-03-12 18:52:35asvetlovsetnosy: + asvetlov
2012-03-11 22:11:41roger.serwysetmessages: + msg155414
2012-03-05 17:59:56terry.reedysetmessages: + msg154966
2011-11-04 08:49:30python-devsetmessages: + msg146999
2011-11-04 00:27:38floxsetstatus: closed -> open
nosy: + flox
messages: + msg146994

2011-11-03 23:49:35vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg146992
2011-11-03 23:42:25python-devsetnosy: + python-dev
messages: + msg146991
2011-11-03 22:14:50ezio.melottisetmessages: + msg146987
2011-11-03 21:49:02terry.reedysetmessages: + msg146984
stage: commit review
2011-11-03 21:39:33loewissetmessages: + msg146983
2011-11-03 19:54:37vstinnersetfiles: + tcl_unicode_range.patch
keywords: + patch
messages: + msg146965
2011-10-30 22:33:31loewissetmessages: + msg146665
2011-10-30 21:46:55ned.deilysetnosy: + kbk, ezio.melotti, roger.serwy, Ramchandra Apte
messages: + msg146663
2011-10-30 21:45:57ned.deilylinkissue13265 superseder
2011-06-17 18:31:10terry.reedysetmessages: + msg138541
components: + Tkinter, - IDLE, IO
versions: + Python 2.7, Python 3.3
2011-06-17 10:54:44vstinnersetmessages: + msg138497
2011-06-16 02:20:54eric.smithsetnosy: + eric.smith
2011-06-15 22:17:20ned.deilysetnosy: + loewis
messages: + msg138402
2011-06-15 22:01:49vstinnersetmessages: + msg138397
2011-06-15 21:59:06ned.deilysetnosy: + ned.deily
messages: + msg138395
2011-06-15 21:47:17vstinnersetnosy: + vstinner
messages: + msg138392
2011-06-15 21:10:07r.david.murraysetnosy: + r.david.murray, terry.reedy

messages: + msg138390
title: characters with ord above 65535 fail conversion with str.format for '{:c}' in IDLE -> characters with ord above 65535 fail to display in IDLE
2011-06-15 20:59:59wujek.srujekcreate