➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: IDLE 3.x on Windows exits when pasting non-BMP unicode
Type: behavior Stage: resolved
Components: IDLE, Tkinter, Unicode Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Aivar.Annamaa, ezio.melotti, malin, miss-islington, ned.deily, serhiy.storchaka, taleinat, terry.reedy, vstinner
Priority: high Keywords: patch

Created on 2011-10-11 20:01 by JBernardo, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tkinter_nobmp_error.patch serhiy.storchaka, 2012-12-20 12:57 review
tkinter_string_conv_3.patch serhiy.storchaka, 2013-09-05 12:25 review
tkinter_pythoncmd_args.patch serhiy.storchaka, 2014-01-05 11:31 review
tkinter_pythoncmd_args_2.patch serhiy.storchaka, 2014-02-09 20:30 review
Pull Requests
URL Status Linked Edit
PR 16363 closed terry.reedy, 2019-09-24 21:48
PR 16365 closed terry.reedy, 2019-09-24 22:08
PR 16545 merged serhiy.storchaka, 2019-10-02 17:56
PR 16580 merged miss-islington, 2019-10-04 10:10
PR 16581 merged miss-islington, 2019-10-04 10:10
Messages (77)
msg145363 - (view) Author: João Bernardo (JBernardo) * Date: 2011-10-11 20:01
I was playing with some unicode chars on Python 3.2 (x64 on Windows 7), but when pasted a char bigger than 0xFFFF, IDLE crashes without any error message.

Example (works fine):
>>> '\U000104a2'
'ð’¢'

But, if I try to paste the above char, the window will instantly close.

The interpreter uses 2-bytes per char (UTF-16) and I don't know if that's causing the problem (as side note, why don't the default Windows build uses 4-bytes char?).

I can't check now with my Ubuntu install (UTF-32) if the problem persists.
msg145366 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-10-11 21:06
This is related to Issue12342.  The problem is that Tcl/Tk 8.5 (and earlier) do not support Unicode code points outside the BMP range as in this example.  So IDLE will be unable to display such characters but it should not crash either.
msg145369 - (view) Author: João Bernardo (JBernardo) * Date: 2011-10-11 21:54
@Ned

That looks like a bit different case. IDLE *can* print the char after you entered the '\Uxxxxxxxx' version of it.

It doesn't accept you to paste those caracters...
msg145573 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-15 00:01
The current Windows build used 2-byte unicode chars because that is what Windows does. In 3.3, all builds will use a new unicode implementation that uses 1,2,or4 bytes as needed. But I suspect we will still have the paste problem unless we can somehow bypass the tk limitation.

Printing a Python string to the screen does not seem to involve conversion to a tk string. Or else tk blindly copies surrogate pairs to Windows even though it cannot create them.

In any case, true window-closing crashes (as opposed to an error traceback) are obnoxious bugs that we try to fix if possible. I verified this on my 64-bit Win 7 system. Thanks for the report. Feel free to look into the code if you can.
msg145580 - (view) Author: João Bernardo (JBernardo) * Date: 2011-10-15 04:03
Just for comparison, on Python 2.7.1 (x32 on Windows 7) it's possible to paste the char (but can't use it) and a nice error is given. 

>>> u'ð’¢'
Unsupported characters in input

So the problem was partially solved but something might have happened with the 3.x port...

Searching on both source codes, I can see the following block was commented on Python3.2 but not on Python2.7 (Maybe someone removed someone else's bug fix?) and an `assert` was added.

#--- Lines 605 to 613 of PyShell.py

assert isinstance(source, str)
#                       v-- on Python2.7 it is types.UnicodeType instead
#if isinstance(source, str):
#    from idlelib import IOBinding
#    try:
#        source = source.encode(IOBinding.encoding)
#    except UnicodeError:
#        self.tkconsole.resetoutput()
#        self.write("Unsupported characters in input\n")
#        return

I uncommented those lines, removed the `assert` and deleted __pycache__ for fresh bytecode but the error keeps happening.

This function `runsource()` is only called after the return key is pressed so the bug was introduced on another part of the program.

I'll search further but it's hard to do that without traceback of the error.

(Maybe `runit()` is the problem because it seems to build the line and call `runsource(line)`)

------
PS: @Terry Reedy
That looks nice to have different lengths for chars but what will be the impact on performance? Indexing will still be in constant time?
msg145581 - (view) Author: João Bernardo (JBernardo) * Date: 2011-10-15 04:33
Just to complete my monologue:
Here's the traceback from running IDLE in cmd line.


C:\Python32\Lib\idlelib>python -i idle.py
Traceback (most recent call last):
  File "idle.py", line 11, in <module>
    idlelib.PyShell.main()
  File "C:\Python32\Lib\idlelib\PyShell.py", line 1429, in main
    root.mainloop()
  File "C:\Python32\Lib\tkinter\__init__.py", line 1009, in mainloop
    self.tk.mainloop(n)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid continuation byte


Not much meaningful but is better than nothing... Looks like some traceback is missing, and this one points to tkinter.
msg145584 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-15 05:48
[Yes, indexing will still be O(1), though I personally consider that less important than most make it to be. Consistency across platforms and total time and space performance of typical apps should be the concern. There is ongoing work on improving the new implementation. Some operations already take less space and run faster.]

The traceback may very well be helpful. It implies that copying a supplemental char does not produce proper utf-8 encoded bytes. Or if it does, tkinter (or tk underneath it) does not recognize them. But then the problem should be the initial byte, not the continuation bytes, which are the same for all chars and which all have 10 for their two high order bits. See
https://secure.wikimedia.org/wikipedia/en/wiki/Utf-8
for a fuller explanation.

Line 1009 is the definition of Misc.mainloop(). I believe self.tk represents the embedded tcl interpreter, which is a black box from Python's viewpoint. Perhaps we should wrap the call with

try:
  self.tk.mainloop(n)
except Exception as e:
  <print error message with all info attached to e before exiting>

This should catch any miscellaneous crashes which are not otherwise caught and maybe turn the crash issues into bug reports -- the same way that running from the command line did. (It will still be good to catch what we can at error sites and give better, more specific messages.) (What I am not familiar with is how the command line interpreter might turn a tcl error into a python exception and why IDLE does not.)

When I copy 'ð’¢' and paste into the command line interpreter or Notepad++, I get '??'. I am guessing that ?? represent a surrogate pair and that Windows separately encodes each. The result would be 'illegal' utf-8 with an illegal continuation chars. An application can choose to decode the 'illegal' utf-8 -- or not. Python can when "errors='surrogate_escape" (or something like that) is specified. It might be possible to access the raw undecoded bytes of the clipboard with the third party pythonwin module. I do not know if there is anyway to do so with tk.

I wonder if tcl is calling back to Python for decoding and whether there was a change in the default for errors or the callback specification that would explain a change from 2.7 to 3.2.

Ezio, do you know anything about these speculations?
msg145585 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-10-15 10:08
Thanks for the additional investigation.  You don't see more in the traceback because the exception is occurring in the _tkinter C glue layer.  I am able to reproduce the problem on some other platforms as well (e.g. Python 3.x on OS X with Carbon Tk 8.4).  More later.
msg145605 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-15 20:25
> Ezio, do you know anything about these speculations?

Assuming that the non-BMP character is represented with two surrogates (\ud801\udca2) and that _tkinter tries to decode them independently, the error message ("invalid continuation byte") would be correct.

Python 2 UTF-8 codec is more permissive and allows encoding/decoding of surrogates (this might also explain why it works on Python 2): 
>>> u'\ud801'.encode('utf-8')
'\xed\xa0\x81'
>>> '\xed\xa0\x81'.decode('utf-8')
u'\ud801'

But on Python 3, trying to decode that results in an error:
>>> b'\xed\xa0\x81'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid continuation byte

> But then the problem should be the initial byte, not the continuation
> bytes, which are the same for all chars and which all have 10 for
> their two high order bits.

While it's true that all continuation bytes have the first two bits equal to '10', the opposite is not always true.  Some start bytes have additional restrictions on the continuation bytes.  For example, even if the first two bits of 0xA0 (0b10100000) are '10', the valid continuation bytes for a sequence starting with 0xED are restricted to the range 80..9F.

The fact that
>>> '\U000104a2'
'ð’¢'
works is because the input is all ASCII, so the decoding doesn't fail.


> [...]
> This should catch any miscellaneous crashes which are not otherwise
> caught and maybe turn the crash issues into bug reports -- the same
> way that running from the command line did.

Having some "safe net" to catch all the unhandled exceptions seems like a good idea.  This won't work in case of segfaults, but it's still better than nothing.  I'm not sure what you mean with "turn them into bug reports" though.
msg145607 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-15 20:48
This can also be reproduced by doing:
>>> print('\U000104a2'[0])
í 
and then copy/pasting the lone surrogate.
The traceback is:
  [...]
  File "C:\Programs\Python32\Lib\tkinter\__init__.py", line 1009, in mainloop
    self.tk.mainloop(n)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid continuation byte
msg145611 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-16 03:45
> I'm not sure what you mean with "turn them into bug reports" though.

In about the last month, there have been, I think, 4 reports about IDLE crashing (quitting unexpectedly with no error traceback). I would consider it preferable if it quit with an error traceback that gave as much info as available, or if there is none, just said "IDLE has met an unexpected problem.", perhaps followed by something like "Pleaee note the circumstances and make a report of the tracker is there is none already."
msg145616 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-16 07:18
> I would consider it preferable if it quit 

Note that if we catch the error there might be no reason for IDLE to quit (unless the error left IDLE in some invalid state).

> with an error traceback that gave as much info as available,

That might scare newbies away.

> or if there is none, just said "IDLE has met an unexpected problem.", 

So this might be better for all the cases.

> perhaps followed by something like "Pleaee note the circumstances and
> make a report of the tracker is there is none already."

The first message could offer a "Report the problem" option that links to the tracker.  In theory we could also have a way to auto-fill the tracker issue, but that might lead to duplicates.
msg145635 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-10-16 19:40
Just to be sure we're talking about the same thing here, my understanding is that the "missing traceback" issues referred to here are only an issue when IDLE is run as a stand-alone GUI program, such as can be done on Windows and with the OS X IDLE.app.  In that case, the standard Python tracebacks from the interpreter written to stderr are not readily visible to the user.  In the OS X IDLE.app case it does get captured in a system log. I'm not sure if that happens anywhere in the Windows cases.  If IDLE is started from a terminal window or console window where stderr is displayed, this is not an issue.

But I think further discussion about proposed improvements to IDLE diagnostics could be useful but it is not germane to the specific bug here.  It should be carried out elsewhere, possibly resulting in a feature request.
msg155799 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-03-14 21:42
Reassigning to Andrew to investigate solution similar to the one used in Issue14200.
msg155801 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-03-14 21:43
(Oops, wrong assignment!)
msg155810 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:21
Issue13582 deals with the IDLE error feedback.
msg155814 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:37
Issue14200 has a patch to fix this problem.
msg155857 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-15 04:46
Not sure. Let me to investigate the problem deeper.
msg177750 - (view) Author: Chris Angelico (Rosuav) * Date: 2012-12-19 15:59
I'm experiencing a similar issue. Fresh install of 3.3 today from the .msi installer on the web site, identifies itself as:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32

To reproduce: Copy and paste this character into IDLE. ð‡

C:\Python33>.\python -m idlelib.idle
Traceback (most recent call last):
  File "C:\Python33\lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python33\lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "C:\Python33\lib\idlelib\idle.py", line 11, in <module>
    idlelib.PyShell.main()
  File "C:\Python33\lib\idlelib\PyShell.py", line 1477, in main
    root.mainloop()
  File "C:\Python33\lib\tkinter\__init__.py", line 1038, in mainloop
    self.tk.mainloop(n)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: invalid continuation byte


(Incidentally, there appears to be a slight difference depending on whether I copy the character in Chrome or Firefox. IDLE terminates the same way, but a Latin-1 app sees the character from Firefox as a letter, but the same thing from Chrome is two question marks (presumably the surrogates).)
msg177778 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-12-19 20:43
Same on 64-bit 3.3. Changed title since 3.3 is no longer '16-bit build'.
msg177812 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-20 12:57
A simplest solution is to raise a TclError instead of ValueError for non-BMP characters. This should not break any existing code, because a user code should be ready to catch a TclError in any case. Here is a patch.

A more complicated solution is to add ValueError to any catch of TclError. And this will fix only IDLE, user programs should fix self every.

Also we can silently encode non-BMP characters for Tcl with UTF-16 (and decode a result back). This can cause some subtle errors with shifted indices however.
msg179276 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-07 19:11
Ping.
msg182106 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-14 16:06
Ping.
msg182130 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-02-15 08:10
LGTM.  The patch does prevent the crash in IDLE which is certainly an improvement until such time as someone investigates having Tk/tkinter fully support non-BMP characters.
msg182141 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-02-15 13:21
@Ned Deily
Tk, at least on my system, doesn't render Unicode characters, even within BMP correctly but the characters are kept (cut-and-paste works correctly)
What you mean by "support".
msg182166 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-02-15 20:27
The characters tk can render depends on the font you tell it to use. On my Windows IDLE, I have Options Font Face set to Lucida Sans Unicode, though I am not sure what has the widest coverage. This page
https://www.microsoft.com/typography/fonts/font.aspx?FMID=1263
only mentions West Asian, but I seem to get more than that.
msg182172 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-02-15 20:48
The font used shouldn't affect the errors.  Usually if a glyph is missing in the current font, either a placeholder (usually a box) is showed instead or the missing glyph is taken from another font (if possible).

If you still want to do some tests, you can take a look at http://en.wikipedia.org/wiki/List_of_Unicode_fonts#Unicode_fonts
msg182180 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-02-15 21:56
Also, there are differences in behavior among the various flavors of Tk.  I know of at least four main flavors in use by current Python builds:  Unix X11-based Tk 8.5, Windows Tk 8.5, OS X Cocoa Tk 8.5, OS X Carbon Tk 8.4.  Some third-party distributors are starting to supply Tk 8.6, in its various flavors, now that 8.6 has been released. Each flavor has various build options and features to fit in with its host o/s environment.  This makes testing tkinter issues *interesting*.
msg182181 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-15 22:03
Tkinter is not compatible with Tcl/Tk 8.6 yet (issue16809).
msg182184 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-02-15 22:11
Serhiy, I'm aware of that; regardless, Tk 8.6 is starting to be used out in the field with tkinter.
msg182207 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2013-02-16 03:21
I have set it to "Ubuntu", which supports the Unicode characters. Maybe
Tkinter doesn't work properly with all the fonts.

On 16 February 2013 01:57, Terry J. Reedy <report@bugs.python.org> wrote:

>
> Terry J. Reedy added the comment:
>
> The characters tk can render depends on the font you tell it to use. On my
> Windows IDLE, I have Options Font Face set to Lucida Sans Unicode, though I
> am not sure what has the widest coverage. This page
> https://www.microsoft.com/typography/fonts/font.aspx?FMID=1263
> only mentions West Asian, but I seem to get more than that.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue13153>
> _______________________________________
>
msg182305 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-02-18 08:30
Serhiy, I think your patch is ready to commit and close this issue as it prevents the crash.  A test would be nice if a reliable test could be devised without too much effort but it's not mandatory, IMO.  Any tangential issues or more complex solutions can be pursued in other issues.
msg182312 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-18 11:07
New changeset bb5a8564e186 by Serhiy Storchaka in branch '2.7':
Issue #13153: Tkinter functions now raise TclError instead of ValueError when
http://hg.python.org/cpython/rev/bb5a8564e186

New changeset 9904f245c3f0 by Serhiy Storchaka in branch '3.2':
Issue #13153: Tkinter functions now raise TclError instead of ValueError when
http://hg.python.org/cpython/rev/9904f245c3f0

New changeset 38bb2a46692e by Serhiy Storchaka in branch '3.3':
Issue #13153: Tkinter functions now raise TclError instead of ValueError when
http://hg.python.org/cpython/rev/38bb2a46692e

New changeset 61993bb9ab0e by Serhiy Storchaka in branch 'default':
Issue #13153: Tkinter functions now raise TclError instead of ValueError when
http://hg.python.org/cpython/rev/61993bb9ab0e
msg192240 - (view) Author: William Schwartz (William.Schwartz) * Date: 2013-07-03 14:43
Looks like this issue is closed, but I got IDLE to crash.

On Python 3.3.2, Windows 7, and Tk version 8.5, IDLE crashes when pasting \U0001F382 (Unicode birthday cake character). Below is the version string for the Python I'm running.

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32.

According to http://docs.python.org/3.3/whatsnew/changelog.html this issue was fixed in Python 3.3.1 RC 1. Indeed the patch discussed above exists in the cpython 3.3 branch: http://hg.python.org/cpython/file/910ec3471d55/Modules/_tkinter.c. But IDLE still crashes, at least for me.
msg192243 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-03 15:08
Unfortunately I can't reproduce a crash on Linux. Perhaps this is Windows only issue.
msg192260 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-07-03 22:27
I verified problem continuing problem windows. There is only a 'crash' when running under pythonw, which does not happenon linux, as far as I know. When running on a console, the error traceback is the same as in msg145581 (with line numbers altered). Changing some ValueError to a TclError did not affect either UnicodeDecodeError or the problem of error reports crashing Idle because there is no stderr to send them to.

This type of crash is a generic problem addressed by #13582. The solution to that would solve the crash part of this issue, but not the message. I would like to see the 2.x message, possibly improved. Perhaps we should resurrect the 2.x codec for processing text pasted into the shell.

As for text windows,
# -*- coding: utf-8 -*-
print('ð’¢')
# ðï¾ï¾’ï¾¢
which is not good either.
msg193566 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-07-22 21:58
In 3.3.2, 3.4.0 the traceback says that the invalid continuation byte (immediately, when 'ð’¢' is pasted) is ED. Snipped version is
  File "F:\Python\dev\py33\lib\tkinter\__init__.py", line 1071, 
    self.tk.mainloop(n)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 1:

As I understand utf-8 from, for instance, the wikipedia article, continuation bytes are 0b10xxxxxx, or A0 to BF and definitely not ED.
msg194564 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-08-06 16:43
0xed is the start byte of a 3 bytes sequence (i.e. a BMP char), and it should be followed by two continuation bytes.

For some reason the traceback you pasted is missing the last part, that might provide some insight.  It could be one of these:
>>> b'\xed'.decode('utf-8') # not enough continuation bytes
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: unexpected end of data
>>> b'\xed\x7f'.decode('utf-8') # not a valid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
msg194586 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-08-06 21:53
Byte 0, not byte 1, is the start byte, and it should be F0, as in output below. However, I now see "invalid continuation byte'.
In 2.7.5,
# -*- coding: utf-8 -*-
s = b'ð’¢'  # output same if uncomment following lines
#s = u'ð’¢'.encode('utf-8')  # 'ð’¢' pasted in from 1st post
#s = u'\U000104a2'.encode('utf-8')  
print(len(s))
for c in s: print(ord(c), hex(ord(c)))
>>> 
4
(240, '0xf0')
(144, '0x90')
(146, '0x92')
(162, '0xa2')

I have no idea how the second pasted byte becomes ED in 3.x.

Attempting to open the file in 3.x results in a broken* 'Untitled' edit window and the following error message in the console.
_tkinter.TclError: character U+104a2 is above the range (U+0000-U+FFFF) allowed by Tcl

* Attempting to close the window either immediately or after entering text results in
AttributeError: 'PyShellEditorWindow' object has no attribute 'extensions'
I have to close the initial python process to get rid of it.
msg194603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-07 10:01
u'\U000104a2' == u'\ud801\udca2' on narrow build.

u'\ud801'.encode('utf-8', 'surrogatepass') == b'\xed\xa0\x81'
u'\udca2'.encode('utf-8', 'surrogatepass') == b'\xed\xb2\xa2'

Hope it will help.
msg194623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-07 17:47
Seems as Tk stores pasted "\U000104a2" as surrogate pair "\ud801\udca2". Then it encoded in UTF-8 as "\xed\xa0\x81\xed\xb2\xa2" end passed to Python. Python converts char* to Unicode object with PyUnicode_FromString() which forbids invalid UTF-8 including encoded surrogates.

Please test proposed patch on Windows.
msg196958 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-04 21:12
This bug can be reproduced on Linux too. Just copy and paste illegal UTF-8 sequence. I.e. b'\xed\xb2\x80' or b'\xc0\x80'. My patch works with first example but failed with second. When change the error handler in fromTclStringAndSize() to "replace" it works with all illegal sequences.
msg196993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-05 12:25
Previous patch has a regression, it breaks decoding NUL which Tcl encodes in "modified" UTF-8 as \xc0\x80. However this part of code already broken, because it handles only singular NUL and not a NUL embedded in larger string.

Here is a patch which also fixes decoding NULs from "modified" UTF-8.
msg201457 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-27 13:22
Martin, could you please review this patch? This bug affects not only IDLE, but any Tkinter application which uses callbacks with arguments. Encoding/decoding error during converting arguments from Tcl to Python causes immediate finishing Tcl main loop (and normally closing an application).
msg207370 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-05 11:31
As far as the patch becomes too complicated, I propose minimalist patch which fixes only this issue. I.e. IDLE will no longer silently closed when paste some unusual text (non-BMP on Windows or illegal UTF-8 on Linux). It fixes converting Tcl string to Python string for arguments of Python callback. \xc0\x80 is translated to the NUL character (U+0000) because Tcl uses "modified" UTF-8. All other illegal UTF-8 codes are replaced by the replacement character (U+FFFD).
msg207371 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-01-05 12:06
I completely lost track which problem is being solved here. Is it still "IDLE crashes when pasting non-BMP unicode char on Py3"? If so, how is this patch solving it?

IMO, the issue shouhldn't have been reopened. Instead, a new issue should have started.
msg207381 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-05 15:13
Yes, it is still the same issue. The root of issue is in converting strings when passed to Python-implemented callbacks. When a text is pasted in IDLE window, the callback is called (for highlighting). The callback is a command created by Tcl_CreateCommand from PythonCmd. PythonCmd is a wrapper which converts arguments (char*) to Python strings and then pass them to Python command. Arguments are encoded in "modified UTF-8" [1], i.e. the NUL character is represented as \xc0\x80, they can contains other invalid UTF-8 sequences (such as encoded surrogates). When decoding arguments to Python strings are failed, main Tcl loop is broken and IDLE silently closed.

When astral character is pasted on Windows, it first encoded to UTF-16 by Windows, then Tcl encodes every 16-bit surrogate to modified UTF-8. The result is not valid UTF-8. On X Window systems the X selection value usually is UTF-8 encoded (the type is UTF8_STRING), but can contains invalid UTF-8 sequences.

I will open separate issue to fix other bugs related to Tcl <-> Python string conversions. The last patch fixes only initial issue which is most important.

[1] http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8
msg210786 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-09 20:30
Here is updated patch (after committing issue20368 which made main refactoring). It makes PythonCmd never fail due to arguments decoding error.
msg254165 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-06 03:16
I am puzzled at the following.  Some reported today that IDLE crashed when pasting the Snake emoji U+1F40D ðŸ.  I copied from Thunderbird and pasted in IDLE on Win10, with same UnicodeDecodeError as before.  I then ran this simple code

from tkinter import *  # 3.4, 3.5, 2.7 with Tkinter
root = Tk()
text = Text(root)
text.pack()
text.focus_set()  # required to work
root.mainloop()

pasted the char there, and to my surprise, a black & white version of the snake appeared. How?  I thought tk does not support astral chars? I copied from the Text window to paste above, where it is green for me.
msg254170 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-06 07:15
There is no the Snake emoji in my font, I use the Cat Face emoji U+1F431 🱠(\xf0\x9f\x90\xb1 in UTF-8, \x3d\xd8\x31\xdc in UTF-16LE).

Move cursor or press Backspace. I had needed to press Left 2 times to move cursor to the begin of the line, press Right 4 times to move cursor back to the end of line, and press Backspace 4 times to remove all stuff. What is called "Tk doesn't support astral characters".

Get the text programmically.

>>> text.get('1.0', '1.end')
'ð゚ï¾ï¾±'
>>> print(ascii(text.get('1.0', '1.end')))
'\xf0\uff9f\uff90\uffb1'

On Linux the clipboard uses UTF-8, and this symbol is represented by 4-bytes bytestring b'\xf0\x9f\x90\xb1' (that is why Tk sometimes interpret it as 4 characters). When you request the text content as a Unicode, Tcl fails to decode the string from UTF-8 and falls back to Latin1. Due to other bug it extends the sign of some bytes. When you programmically insert the same string back, it will be encoded to b'\xc3\xb0\xef\xbe\x9f\xef\xbe\x90\xef\xbe\xb1' and displayed as 'ð゚ï¾ï¾±'.

On Windows the clipboard uses UTF-16LE and you can see different results.

The underlying graphical system can support astral characters, but Tk fails to handle them correctly.
msg318862 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-06 22:12
I closed #33785 as a duplicate of this.

When IDLE is opened with idlelib/idlew.py from an icon, pasting ðŒˆ or ðŒ†ðŒ€ðŒ‹ silently closes IDLE.  When idle is opened from a console with 'python -m idlelib', IDLE closes and a traceback essentially identical to the one in msg145581 is printed in the console.  When IDLE is opened with 'import idlelib.idle' in interactive python in a console, the traceback is printed and IDLE remains but becomes non-responsive.  Clicking on IDLE turns the window gray.  Clicking on [X] brings up the Windows "Python is not working.  Close?" box.

This issue was opened almost 8 years ago.  After I submit this, I will reduce the nosy list from 13 to the 4 I think most likely to still be interested. If I am wrong, delete or re-add yourself.  I will then discuss patches.

Remove: JBernardo, RamchandraApte, Rosuav, THRlWiTi, William.Schwartz, asvetlov, loewis, python-dev, roger.serwy

Keep: Ezio, Ned, Serhiy, me.
msg319247 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-10 21:10
AFAIK, the big new feature of tcl/tk 9.0 is intended to be full unicode support.  We can hope that 9.0 appears in time to be included in the 3.8 installers.

Until then, I think filenames, user program output, and clipboard content should be checked for the presence of astral characters before being sent to a tk widget. For this issue, that means replacing the built-in <<Paste>> handler.  Replace astral chars with \U000nnnn escapes.  If the widget it a Text, tag the escape as 'Astral' and color it with the code context colors to distinguish it from escapes originally in the string.

Strings know their kind, but a request to expose that has been rejected.  Pyshell currently compares the max codepoint to 'ffff'.  But it appears that we can detect kind with an O(1) expression.  For 3.6 and 3.7, "sys.getsizeof(s) == 76 + len(s)".  For 3.8, "sys.getsizeof(s) == 48 + len(s)".  Does anyone know why the difference?
msg348153 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-07-19 08:52
Closed #37614 in favor of this.  

We now have only Python with FSR and mostly only tcl 8.6 to worry about.  But I presume the Windows clipboard still uses uft-16le.  Experimenting with pasting ð’¢ or 'ð’¢', I usually get the 'ed' message as before, but with the quoted astral, IDLE somethings hangs.  If I wait before trying to close, I get a message from Windows about waiting or closing.

Currently, an attempt to print an astral char, as opposed to paste, results in
>>> print('\U00011111')
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    print('\U00011111')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U00011111' in position 0: Non-BMP character not supported in Tk
Improving this is a separate issue, as is editing a .py file with an astral char in the name or test.
msg352661 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-09-17 19:59
Another report today on idle-dev that pasting emoji exits IDLE.

Serhiy, I applied the _tkinter part of your...args_2.patch to a branch of current master -- see serhiy_tkinter.patch.  (Could push branch if helpful.).

After recompiling _tkinter.c, pasting 🱠still gives same error.
msg353083 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-09-24 12:23
I can confirm that the crash from pasting these characters happens when trying to fetch the clipboard contents.  We can override the built-in <<Paste>> event, but then we have to get the clipboard's contents directly, and the only portable way to do that in the stdlib is via Tkinter's clipboard_get(). (For a non-stdlib solution, check out pyperclip on PyPI.)

clipboard_get(), which I assume calls what Tk uses internally to handle the <<Paste>> event, crashes in the C code with a UnicodeDecodeError.  Here's a traceback from calling clipboard_get() with 🱠in the clipboard (Windows 10, recent master branch, i.e. to be 3.9):

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\Tal\dev\cpython\lib\tkinter\__init__.py", line 1885, in __call__
    return self.func(*args)
  File "C:\Users\Tal\dev\cpython\lib\idlelib\multicall.py", line 176, in handler
    r = l[i](event)
  File "C:\Users\Tal\dev\cpython\lib\idlelib\editor.py", line 618, in paste
    print(self.text.clipboard_get())
  File "C:\Users\Tal\dev\cpython\lib\tkinter\__init__.py", line 867, in clipboard_get
    return self.tk.call(('clipboard', 'get') + self._options(kw))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

From a quick look, this appears to be happening in _tkinter.c, here:

static PyObject *
unicodeFromTclStringAndSize(const char *s, Py_ssize_t size)
{
    PyObject *r = PyUnicode_DecodeUTF8(s, size, NULL);
    ...

My guess is that Tk is passing the clipboard contents as-is, and we're simply not decoding it with the proper encoding (i.e. utf-16le on Windows).

Is this something worth fixing / working around in Tkinter, e.g. by using a proper encoding depending on the platform for fetching clipboard contents? Or are we content to continue waiting for Tk to fix this?
msg353123 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-09-24 23:15
Recap: IDLE 3.x on Windows exits with UnicodeDecodeError when pasting into editor, grep, or shell window a non-BMP astral character such as
ð’¢ '\U000104a2', ð‡, ðŸ '\U0001F40D', or 
🱠'\U0001F431' UTF-8 b'\xf0\x9f\x90\xb1', UTF-16LI b'\x3d\xd8\x31\xdc'.  Display issues are not directly of this issue.

The exact error message has varied with the python version, but all likely result from the same error.

3.2 msg145581: traceback PyShell.main(), root.mainloop(), tk,mainloop().

  'utf8' codec can't decode bytes in position 1-2: invalid continuation byte

3.3 msg177750: traceback starts with two calls in new runpy module.
  'utf-8' codec can't decode bytes in position 0-2: invalid continuation byte

3.6 to now: same traceback
  'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte

The initial byte is 0xed regardless of which astral char above is pasted.  Tal, if the problem were utf-8 decoding uft-16le bytes, the initial byte in the error message for astral chars would (usually) be 0xd8, and there would be problems with BMP chars also.

In msg145584, I speculated that the problem might be trying to decode a now illegal utf-8 encoding of a surrogate character.  In msg145605, Ezio said that the first surrogate would be '\ud801' and showed that the 2.7 utf-8 'encoding' of that is b'\xed\xa0\x81' and that trying to decode that give the 3.2 error above, but with '0-1' instead of '1-2'.  (0xed is the utf-8 start byte for any BMP char and continuation bytes that map to the surrogate blocks, and some others, are now invalid.)  Today, b'\xed\xa0\x81'.decode('utf-8') gives exactly the current message above.

In msg254165, I noted that pasting copied astral chars into a plain Text widget works in the sense that there is no error.  (For me, ð’¢ is replaced by two replacement chars and the others are shown without colors, but this depends on OS and font.) I just verified this same for Entry widgets in IDLE dialogs and the Font settings sample text.  As Serhiy said in msg254165, Left x 2 is needed to move back past the char and Backspace x 2 to delete it.  (For me, only 1 Right is needed to move forward past the char.)  But Serhiy also showed that once an astral char *is* displayed, it cannot be properly retrieved.

So the question is, if windows puts utf-16le surrogates on the clipboard, and they can be pasted and displayed some in a Text, why is something trying to utf-8 decode the utf-8 encoding of each surrogate when pasting into IDLE's augmented text?

In msg207381, Serhiy claimed "The root of issue is in converting strings when passed to Python-implemented callbacks. When a text is pasted in IDLE window, the callback is called (for highlighting). ...".  He goes on to explain that tcl *does* encode surrogates to modified utf-8 before passing to them to callbacks and claimed that tkinter_pythoncmd_args_2.patch should fix this.

Disabling Colorizer is not enough to allow astral pasting.  See PR 16365. Whatever Serhiy's patch did 5 years ago, my copy does not work now.  See PR 16365. 

Tal, we augment the x11 paste callback in pyshell.fix_x11_paste.  There is no unittest and we would have to not break this with further change.

I have thought about replacing the paste callback with clipboard_get, but worried that we might not be able to replicate what the system-specific tcl/tk/C code does.  That sometimes includes displaying the actual astral character. I presume that tcl just passes the clipboard bytes to the graphics system, which we cannot do from python.

Anyway, you have shown that clipboard.get does not currently work as we might want.  From what Serhiy has said, char *s points to invalid utf-8 bytes.
msg353152 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-25 07:01
I now have an access to Windows (I did not have it 5 years ago), so I'm going to finish this issue if I have a time.
msg353761 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-02 17:56
PR 16545 solves the problem by using OS specific methods for converting between Python and Tcl strings. It is not ideal, but is good enough for most real cases.

Now you can paste, copy and print non-BMP characters. The code containing them can be displayed weird, but the result of print looks OK.

>>> '\N{PERSONAL COMPUTER}'
'💻'
>>> print('💻')
💻

As a side effect, printing '\udcf0\udc9f\udc90\udc8d' on Linux and '\ud83d\udcbb' on Windows should have the same effect as printing '\U0001f4bb'.

I do not know about macOS, but expect the same behavior as on Linux. Could anybody test please?
msg353765 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-02 18:52
Serhiy, this looks like a great step in the right direction!

Tested on Win10 with PR GH-16545 (commit f4db0e7e00). Here is a copy/paste from an IDLE shell session:

>>> '\N{PERSONAL COMPUTER}'
'💻'
>>> print('💻')
SyntaxError: 'utf-8' codec can't encode characters in position 7-12: surrogates not allowed

Note that in the first output, the second and third chars in the string aren't visible in IDLE; i.e. what is actually displayed is 'ð»'.
msg353766 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-02 20:04
Not sure if this helps, but a bit of experimentation brought this up:

>>> '\N{PERSONAL COMPUTER}'.encode('utf-8')
b'\xf0\x9f\x92\xbb'
>>> '💻'.encode('utf-16le')
b'\xf0\x00\x9f\x00\x92\x00\xbb\x00'
>>> '💻'.encode('utf-16')
b'\xff\xfe\xf0\x00\x9f\x00\x92\x00\xbb\x00'
msg353767 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-02 20:18
More info:

>>> '\N{PERSONAL COMPUTER}'.encode('utf-8').decode('latin-1') == '💻'
True
msg353769 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-02 20:51
Sorry, I did not test the last version on Windows. There was a bug which caused using the Linux version on Windows. Now it should be fixed.
msg353812 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-03 05:03
The revised PR appears to fix this and other issues, although the presence of astral chars in code being edited messes up tk's cursor positioning.  Assuming that this cannot be changed, we could add the the ability to replace astral chars with \U escapes.
msg353833 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-03 09:10
From the point of view of Tk, the astral character "💻" looks like either two invisible characters "\ud83d\udcbb" or as four characters "ð\x9f\x92»" (two of them are invisible). Thus this breaks editing the physical line past the astral character. We cannot do anything with this.

It also breaks syntax highlighting up to 100 lines past the astral character. We can add a workaround for this, but I am not sure it is worth. The solution could be complex and slow down the common case. In any case it is a different issue.

File names with astral characters now are shown correctly in most cases. Astral characters are not shown in the title of the window, perhaps it is font depending.

Opening a file name with astral characters works in the command line, but it does not work via the file open dialog. This looks like a bug in Tk, we cannot workaround it (or at least the possible workaround would be ugly and partial).
msg353848 - (view) Author: Ma Lin (malin) * Date: 2019-10-03 13:21
> Thus this breaks editing the physical line past the astral character. We cannot do anything with this.

I tried, it's sad the experience is not very good.
msg353897 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-04 05:02
A week ago, I thought that the astral solution was to always replace with the \U escape.  With this patch, we can and should send them to read-only text windows, and let the OS and font display it or a substitute.  On Windows, at least, the emoji which beginners most often want to use get displayed.

Elsewhere, we will have to check and do some follow-up patches.  For using file names with astral chars results, on Windows, in six large boxes, and when the file is saved, it is saved in a new file with the boxes, not the original file.  Such file names are not added to the recent files list, or maybe list boxes cannot handle them.

Code is another issue.  Astral chars in files could be replaced when read.  Unfortunately, I believe some are legal identifier chars.  On the clipboard, on Windows, astral chars become sequences of 6 surrogates.

>>> r.clipboard_clear()
>>> r.clipboard_append('🚀')
>>> r.clipboard_get()
'\udced\udca0\udcbd\udced\udcba\udc80'

Perhaps we should try to intercept paste and replace such sequences with the \U escape.
msg353898 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 05:50
Thank you for your example Terry. There was one dubious place which I did not change because I did not know how to trigger the execution of it. Now the clipboard is fixed.
msg353901 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-04 07:42
What do you mean by fixed?  After deleting and remaking a pr_16545 branch, I see the same result for clipboard_get.
msg353903 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 08:08
What is the result of new tests?

    python.bat -m test -v -uall test_tk -m test_clipboard*
msg353910 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-04 09:00
After remembering to recompile (sorry), the test passes and clipgoard_get returns the rocket.  Very nice, thank you.
msg353916 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 10:09
New changeset 06cb94bc8419b9a24df6b0d724fcd8e40c6971d6 by Serhiy Storchaka in branch 'master':
bpo-13153: Use OS native encoding for converting between Python and Tcl. (GH-16545)
https://github.com/python/cpython/commit/06cb94bc8419b9a24df6b0d724fcd8e40c6971d6
msg353918 - (view) Author: miss-islington (miss-islington) Date: 2019-10-04 10:28
New changeset 6c3fbbc177f5e1867ab09a315dbf58554a80accd by Miss Islington (bot) in branch '3.7':
bpo-13153: Use OS native encoding for converting between Python and Tcl. (GH-16545)
https://github.com/python/cpython/commit/6c3fbbc177f5e1867ab09a315dbf58554a80accd
msg353919 - (view) Author: miss-islington (miss-islington) Date: 2019-10-04 10:28
New changeset dc191245d8f63f5ab41afff0468b7463a07e7b00 by Miss Islington (bot) in branch '3.8':
bpo-13153: Use OS native encoding for converting between Python and Tcl. (GH-16545)
https://github.com/python/cpython/commit/dc191245d8f63f5ab41afff0468b7463a07e7b00
msg353937 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-04 12:26
> bpo-13153: Use OS native encoding for converting between Python and Tcl. (GH-16545)

WOW. That's huge. The issue with non-BMP characters has been fixed? Finally? The issue was haunting the bug tracker for at least 8 years!!!
msg353945 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-04 13:22
Indeed, Serhiy, you've done an amazing job with this change and it will greatly benefit many people.
msg358701 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2019-12-20 10:35
>>> '\N{PERSONAL COMPUTER}'

freezes IDLE 3.7.6 (64-bit, downloaded from python.org) on macOS 10.15

Can it be because Tk 8.6.8 is still used there?
msg358704 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-12-20 12:13
On Windows with 8.6.9, I see '\U0001f4bb' on 3.7.5 and '💻' on 3.8.0 and 3.9.0a0.  I don't know why the difference as Serhiy's patch was backported. I will upgrade 3.7 and 3.8 and try again.
History
Date User Action Args
2022-04-11 14:57:22adminsetgithub: 57362
2019-12-20 12:13:44terry.reedysetmessages: + msg358704
2019-12-20 10:35:40Aivar.Annamaasetnosy: + Aivar.Annamaa
messages: + msg358701
2019-10-04 13:22:43taleinatsetmessages: + msg353945
2019-10-04 12:26:06vstinnersetnosy: + vstinner
messages: + msg353937
2019-10-04 11:44:01serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-10-04 10:28:59miss-islingtonsetmessages: + msg353919
2019-10-04 10:28:56miss-islingtonsetnosy: + miss-islington
messages: + msg353918
2019-10-04 10:10:14miss-islingtonsetpull_requests: + pull_request16172
2019-10-04 10:10:07miss-islingtonsetpull_requests: + pull_request16171
2019-10-04 10:09:55serhiy.storchakasetmessages: + msg353916
2019-10-04 09:00:35terry.reedysetmessages: + msg353910
2019-10-04 08:08:55serhiy.storchakasetmessages: + msg353903
2019-10-04 07:42:28terry.reedysetmessages: + msg353901
2019-10-04 05:50:36serhiy.storchakasetmessages: + msg353898
2019-10-04 05:02:50terry.reedysetmessages: + msg353897
2019-10-03 13:21:58malinsetnosy: + malin
messages: + msg353848
2019-10-03 09:10:09serhiy.storchakasetmessages: + msg353833
2019-10-03 05:03:30terry.reedysetmessages: + msg353812
2019-10-02 20:51:07serhiy.storchakasetmessages: + msg353769
2019-10-02 20:18:16taleinatsetmessages: + msg353767
2019-10-02 20:04:04taleinatsetmessages: + msg353766
2019-10-02 18:52:54taleinatsetmessages: + msg353765
2019-10-02 17:56:20serhiy.storchakasetpull_requests: + pull_request16135
2019-10-02 17:56:02serhiy.storchakasetmessages: + msg353761
2019-09-25 07:01:00serhiy.storchakasetmessages: + msg353152
2019-09-24 23:15:00terry.reedysetmessages: + msg353123
2019-09-24 22:08:50terry.reedysetpull_requests: + pull_request15945
2019-09-24 21:48:05terry.reedysetpull_requests: + pull_request15943
2019-09-24 12:23:06taleinatsetnosy: + taleinat
messages: + msg353083
2019-09-17 19:59:32terry.reedysetmessages: + msg352661
2019-07-19 08:52:12terry.reedysettitle: IDLE 3.x on Windows crashes when pasting non-BMP unicode -> IDLE 3.x on Windows exits when pasting non-BMP unicode
messages: + msg348153
versions: + Python 3.9, - Python 3.6
2019-07-19 07:20:29terry.reedylinkissue37614 superseder
2019-06-14 20:49:37terry.reedylinkissue37286 superseder
2018-06-10 21:10:36terry.reedysetmessages: + msg319247
2018-06-06 22:16:12terry.reedysetnosy: - loewis, roger.serwy, asvetlov, THRlWiTi, python-dev, JBernardo, Rosuav, Ramchandra Apte, William.Schwartz
2018-06-06 22:14:40terry.reedysetmessages: - msg318863
2018-06-06 22:14:04terry.reedysetmessages: + msg318863
2018-06-06 22:12:34terry.reedysettitle: IDLE crashes when pasting non-BMP unicode char on Py3 -> IDLE 3.x on Windows crashes when pasting non-BMP unicode
messages: + msg318862
versions: + Python 3.8
2018-06-06 21:38:43terry.reedylinkissue33785 superseder
2017-11-26 07:20:20terry.reedysetversions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.4, Python 3.5
2016-05-24 08:55:05eryksunlinkissue27091 superseder
2016-02-27 10:05:25terry.reedylinkissue26420 superseder
2015-12-06 12:58:17THRlWiTisetnosy: + THRlWiTi
2015-11-06 07:15:52serhiy.storchakasetmessages: + msg254170
2015-11-06 03:16:17terry.reedysetmessages: + msg254165
2014-12-09 23:01:07terry.reedysettype: crash -> behavior
versions: + Python 3.5, - Python 3.3
2014-02-09 20:30:45serhiy.storchakasetfiles: + tkinter_pythoncmd_args_2.patch

messages: + msg210786
2014-01-05 15:13:57serhiy.storchakasetmessages: + msg207381
2014-01-05 12:06:39loewissetmessages: + msg207371
2014-01-05 11:31:30serhiy.storchakasetfiles: + tkinter_pythoncmd_args.patch

messages: + msg207370
2013-10-27 13:23:11serhiy.storchakasetpriority: normal -> high
2013-10-27 13:22:37serhiy.storchakasetmessages: + msg201457
2013-10-01 20:22:26serhiy.storchakasetnosy: + loewis
2013-10-01 20:19:57serhiy.storchakasetcomponents: + Unicode
2013-09-05 12:26:34serhiy.storchakasetfiles: - tkinter_string_conv_2.patch
2013-09-05 12:26:15serhiy.storchakasetfiles: - tkinter_string_conv.patch
2013-09-05 12:25:37serhiy.storchakasetfiles: + tkinter_string_conv_3.patch

messages: + msg196993
2013-09-04 21:12:39serhiy.storchakasetfiles: + tkinter_string_conv_2.patch
assignee: serhiy.storchaka
messages: + msg196958

stage: patch review
2013-08-07 19:48:49serhiy.storchakasetfiles: + tkinter_string_conv.patch
2013-08-07 19:47:00serhiy.storchakasetfiles: - tkinter_string_conv.patch
2013-08-07 17:47:38serhiy.storchakasetfiles: + tkinter_string_conv.patch

messages: + msg194623
2013-08-07 10:01:05serhiy.storchakasetmessages: + msg194603
2013-08-06 21:53:58terry.reedysetmessages: + msg194586
2013-08-06 16:43:32ezio.melottisetmessages: + msg194564
2013-07-22 21:58:09terry.reedysetmessages: + msg193566
2013-07-22 17:33:29serhiy.storchakasetassignee: serhiy.storchaka -> (no value)
stage: resolved -> (no value)
2013-07-03 22:27:54terry.reedysetresolution: fixed -> (no value)
messages: + msg192260
versions: - Python 3.2
2013-07-03 15:08:35serhiy.storchakasetstatus: closed -> open

messages: + msg192243
2013-07-03 14:43:28William.Schwartzsetnosy: + William.Schwartz
messages: + msg192240
2013-02-18 11:08:48serhiy.storchakasetstatus: open -> closed
assignee: asvetlov -> serhiy.storchaka
resolution: fixed
stage: commit review -> resolved
2013-02-18 11:07:07python-devsetnosy: + python-dev
messages: + msg182312
2013-02-18 08:30:08ned.deilysetmessages: + msg182305
stage: test needed -> commit review
2013-02-16 03:21:49Ramchandra Aptesetmessages: + msg182207
2013-02-15 22:11:14ned.deilysetmessages: + msg182184
2013-02-15 22:03:52serhiy.storchakasetmessages: + msg182181
2013-02-15 21:57:00ned.deilysetmessages: + msg182180
2013-02-15 20:48:42ezio.melottisetmessages: + msg182172
2013-02-15 20:27:08terry.reedysetmessages: + msg182166
2013-02-15 13:21:08Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg182141
2013-02-15 08:10:33ned.deilysetmessages: + msg182130
2013-02-14 16:06:34serhiy.storchakasetmessages: + msg182106
2013-01-07 19:11:26serhiy.storchakasetmessages: + msg179276
2012-12-20 12:57:02serhiy.storchakasetfiles: + tkinter_nobmp_error.patch
versions: + Python 2.7
nosy: + serhiy.storchaka

messages: + msg177812

keywords: + patch
2012-12-19 20:43:49terry.reedysettitle: IDLE crashes when pasting non-BMP unicode char on UCS-16 build -> IDLE crashes when pasting non-BMP unicode char on Py3
messages: + msg177778
versions: + Python 3.4
2012-12-19 16:01:59Rosuavsetnosy: + Rosuav
messages: + msg177750
2012-03-15 04:46:44asvetlovsetmessages: + msg155857
2012-03-14 22:37:27roger.serwysetmessages: + msg155814
2012-03-14 22:21:19roger.serwysetnosy: + roger.serwy
messages: + msg155810
2012-03-14 21:44:37ned.deilysetnosy: - astrand
2012-03-14 21:43:59ned.deilysetassignee: astrand -> asvetlov

messages: + msg155801
nosy: + asvetlov
2012-03-14 21:42:40ned.deilysettitle: IDLE crash with unicode bigger than 0xFFFF -> IDLE crashes when pasting non-BMP unicode char on UCS-16 build
nosy: + astrand

messages: + msg155799

assignee: ned.deily -> astrand
versions: + Python 3.3
2011-10-16 19:40:29ned.deilysetmessages: + msg145635
2011-10-16 07:18:44ezio.melottisetmessages: + msg145616
2011-10-16 03:45:05terry.reedysetmessages: + msg145611
2011-10-15 20:48:11ezio.melottisetmessages: + msg145607
2011-10-15 20:25:20ezio.melottisetmessages: + msg145605
2011-10-15 10:08:01ned.deilysetmessages: + msg145585
components: + Tkinter, - Unicode, Windows
2011-10-15 05:48:28terry.reedysetmessages: + msg145584
2011-10-15 04:33:10JBernardosetmessages: + msg145581
2011-10-15 04:03:45JBernardosetmessages: + msg145580
2011-10-15 00:01:59terry.reedysettype: behavior -> crash

messages: + msg145573
nosy: + terry.reedy
2011-10-14 23:42:20ezio.melottisetnosy: + ezio.melotti
type: behavior
2011-10-11 21:54:21JBernardosetmessages: + msg145369
2011-10-11 21:06:15ned.deilysetnosy: + ned.deily
messages: + msg145366

assignee: ned.deily
stage: test needed
2011-10-11 20:01:32JBernardocreate