Issue 18118: curses utf8 output broken in Python2

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/62318

classification

Title:	curses utf8 output broken in Python2
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	helmut, r.david.murray, vstinner
Priority:	normal	Keywords:

Created on 2013-06-02 10:14 by helmut, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg190479 - (view)	Author: (helmut)	Date: 2013-06-02 10:14
Consider the test case below. <<< #!/usr/bin/python # -- encoding: utf8 -- import curses def wrapped(screen): screen.addstr(0, 0, "ä") screen.addstr(0, 1, "ö") screen.addstr(0, 2, "ü") screen.getch() if __name__ == "__main__": curses.wrapper(wrapped) >>> Expected output: "äöü" Output on py3.3: as expected Output on py2.7.3: "?ü" The actual bytes (as determined by strace) were "\303\303\303\274". Observe the inclusion of broken utf8 sequences. This issue was initially discovered on Debian sid, but independently confirmed on Arch Linux and two more unknown.
msg190483 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2013-06-02 11:08
I believe this is one of a class of bugs that are fixed in Python3, and that are unlikely to be fixed in Python2. I'll defer to Victor, though, who made a number of curses unicode fixes in Python3.
msg190500 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-02 20:12
Is your Python curses module linked to libncurses.so.5 or libncursesw.so.5? Example: $ ldd /usr/lib/python2.7/lib-dynload/_cursesmodule.so \|grep curses libncursesw.so.5 => /lib/libncursesw.so.5 (0x00375000) libncursesw has a much better support of Unicode than libncurses. Since Python 3.3, the Python curses.window.addstr() method uses waddwstr() when the module is linked to libncursesw, which also improves the Unicode support.
msg190501 - (view)	Author: (helmut)	Date: 2013-06-02 20:22
All reproducers confirmed that their _cursessomething.so is linked against libncursesw.so.5.
msg190503 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-02 20:45
u"äöü" encoded to "utf-8" gives '\xc3\xa4\xc3\xb6\xc3\xbc' "\303\303\303\274" is '\xc3\xc3\xc3\xbc'. I guess that curses considers that '\xc3\xa4' is a string of 2 characters: screen.addstr(0, 1, "ö") replaces the second "character", '\xa4'. I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works. If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3.
msg190519 - (view)	Author: (helmut)	Date: 2013-06-03 06:03
> I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works. It works as in "the output looks as the one expected". Long lines with utf8 characters will make it break again though. screen.addstr(0, 0, "äöü" * 20) # assuming COLUMNS=80 Will give two rows of characters of which the first row is 40 characters long. > If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3. Sounds sensible. Are you aware of a workaround for this issue? I.e. is there any way to force Python2.7 to use the wide mode for outputting characters?
msg190521 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-03 07:40
"Sounds sensible. Are you aware of a workaround for this issue? I.e. is there any way to force Python2.7 to use the wide mode for outputting characters?" I don't think that it is possible to workaround this issue, it is a bug in the design of curses, related to Unicode. I suppose that libncursesw uses an array of wchar_t characters when the _wch() and wstr() functions are used, whereas your version looks to use an array of char* characters and so is unable to understand that a character is composed of two bytes (ex: b"\xc3\xa4" for u"ä").

History
Date	User	Action	Args
2022-04-11 14:57:46	admin	set	github: 62318
2013-06-03 07:40:39	vstinner	set	messages: + msg190521
2013-06-03 06:03:34	helmut	set	messages: + msg190519
2013-06-02 20:45:42	vstinner	set	status: open -> closed resolution: wont fix messages: + msg190503
2013-06-02 20:22:27	helmut	set	messages: + msg190501
2013-06-02 20:12:17	vstinner	set	messages: + msg190500
2013-06-02 11:08:10	r.david.murray	set	nosy: + vstinner, r.david.murray messages: + msg190483 title: curses utf8 output broken -> curses utf8 output broken in Python2
2013-06-02 10:14:02	helmut	create