Message 380393 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	epaine, ezio.melotti, ned.deily, ronaldoussoren, serhiy.storchaka, terry.reedy, vstinner, wordtech
Date	2020-11-05.02:28:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1604543329.93.0.296485999422.issue42225@roundup.psfhosted.org>
In-reply-to

Content
Kevin, Serhiy tried to report this upstream but failed. msg380143. Perhaps you could. One person running my test program reported """ Fedora 32 x86-64 Cinnamon 4.6.7 Linux 5.8.16-200.fc32.x86_64 Python 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux Running line-by-line in terminal, the for-loop crashes with: <<< X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 138 (RENDER) Minor opcode of failed request: 20 (RenderAddGlyphs) Serial number of failed request: 3925 Current serial number in output stream: 4865 """ Another reported "Seems to produce garbage on my system: [ads@ADS4 x]$ uname -a Linux ADS4 5.8.17-100.fc31.x86_64 #1 SMP Thu Oct 29 18:58:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux" But the program ran to completion without errors. A copy of the output from the window was attached. I have asked for the tcl/tk version. My response included: """ On *nix, Python (unicode) chars are utf-8 encoded by _tkinter for tk. The encoding of astral non-BMP chars uses 4 bytes. Perhaps tk on your ADS Linux (new to me) displays the 4 bytes as 4 chars instead of 1. For each block of 32, the first 3 are the same. This is true in this file, but easily seeing this depends on the display software. I don't know what you saw, but Notepad++ displays control chars with the high bit set (C1 controls) as their reversed type (white on black) 3 char acronym as defined on https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) Character table. Thus the first astral U+10000 is encoded as b"\xF0\x90\x80\x80. In Notepad++, what is in the file appears as 4 characters, not 1, displayed 'ðDCSPADPAD', with the part after ð being being the correct white on black triplets for code points U+90 and U+80. The first char '\xf0' == 'ð' is the same for all quadruples shown by Notepad++. The next 3 vary as appropriate. In some cases, all 4 are normal printable chars, such as 0x29aa0, a CJK char, showing as "ð©ª " If I cut the first 4 chars from Notepad++ to Thunderbird the result is "ð". I see only ð but the presence of 3 0-width chars is revealed by moving through the string with arrow keys. """ Here on Firefox the C1 controls, invisible in Thunderbird, display as squares with digits 0090, 0080 in two rows. Serhiy probably understands these reports better than I do. This tc in ADS4 Linux seems to doing something like what Serhiy described as "Tcl fails to decode the string from UTF-8 and falls back to Latin1" before his _tkinter fix. As far as IDLE and Linux is concerned, I am just going to consider what to change or add in "User output in Shell" in the IDLE doc.

Kevin, Serhiy tried to report this upstream but failed. msg380143.
Perhaps you could.

One person running my test program reported
"""
Fedora 32 x86-64
Cinnamon 4.6.7
Linux 5.8.16-200.fc32.x86_64
Python 3.8.6 (default, Sep 25 2020, 00:00:00)
[GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux

Running line-by-line in terminal, the for-loop crashes with:
<<<
X Error of failed request: BadLength (poly request too large or internal Xlib length error)
Major opcode of failed request: 138 (RENDER)
Minor opcode of failed request: 20 (RenderAddGlyphs)
Serial number of failed request: 3925
Current serial number in output stream: 4865
"""

Another reported "Seems to produce garbage on my system:
[ads@ADS4 x]$ uname -a
Linux ADS4 5.8.17-100.fc31.x86_64 #1 SMP Thu Oct 29 18:58:48 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux"

But the program ran to completion without errors. A copy of the output from the window was attached. I have asked for the tcl/tk version. My response included:
"""
On *nix, Python (unicode) chars are utf-8 encoded by _tkinter for tk. The encoding of astral non-BMP chars uses 4 bytes. Perhaps tk on your ADS Linux (new to me) displays the 4 bytes as 4 chars instead of 1. For each block of 32, the first 3 are the same. This is true in this file, but easily seeing this depends on the display software.

I don't know what you saw, but Notepad++ displays control chars with the high bit set (C1 controls) as their reversed type (white on black) 3 char acronym as defined on
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) Character table.

Thus the first astral U+10000 is encoded as b"\xF0\x90\x80\x80. In Notepad++, what is in the file appears as 4 characters, not 1, displayed 'ðDCSPADPAD', with the part after ð being being the correct white on black triplets for code points U+90 and U+80. The first char '\xf0' == 'ð' is the same for all quadruples shown by Notepad++. The next 3 vary as appropriate. In some cases, all 4 are normal printable chars, such as 0x29aa0, a CJK char, showing as "ð©ª "

If I cut the first 4 chars from Notepad++ to Thunderbird the result is "ð". I see only ð but the presence of 3 0-width chars is revealed by moving through the string with arrow keys.
"""
Here on Firefox the C1 controls, invisible in Thunderbird, display as squares with digits 0090, 0080 in two rows. Serhiy probably understands these reports better than I do. This tc in ADS4 Linux seems to doing something like what Serhiy described as "Tcl fails to decode the string from UTF-8 and falls back to Latin1" before his _tkinter fix.

As far as IDLE and Linux is concerned, I am just going to consider what to change or add in "User output in Shell" in the IDLE doc.

History
Date	User	Action	Args
2020-11-05 02:28:49	terry.reedy	set	recipients: + terry.reedy, ronaldoussoren, vstinner, wordtech, ned.deily, ezio.melotti, serhiy.storchaka, epaine
2020-11-05 02:28:49	terry.reedy	set	messageid: <1604543329.93.0.296485999422.issue42225@roundup.psfhosted.org>
2020-11-05 02:28:49	terry.reedy	link	issue42225 messages
2020-11-05 02:28:49	terry.reedy	create