This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: curses utf8 output broken in Python2
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: helmut, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2013-06-02 10:14 by helmut, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg190479 - (view) Author: (helmut) Date: 2013-06-02 10:14
Consider the test case below.

<<<
#!/usr/bin/python
# -*- encoding: utf8 -*-

import curses

def wrapped(screen):
    screen.addstr(0, 0, "ä")
    screen.addstr(0, 1, "ö")
    screen.addstr(0, 2, "ü")
    screen.getch()

if __name__ == "__main__":
    curses.wrapper(wrapped)
>>>

Expected output: "äöü"
Output on py3.3: as expected
Output on py2.7.3: "?ü"
The actual bytes (as determined by strace) were "\303\303\303\274". Observe the inclusion of broken utf8 sequences.

This issue was initially discovered on Debian sid, but independently confirmed on Arch Linux and two more unknown.
msg190483 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-02 11:08
I believe this is one of a class of bugs that are fixed in Python3, and that are unlikely to be fixed in Python2.  I'll defer to Victor, though, who made a number of curses unicode fixes in Python3.
msg190500 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-02 20:12
Is your Python curses module linked to libncurses.so.5 or libncursesw.so.5? Example:

$ ldd /usr/lib/python2.7/lib-dynload/_cursesmodule.so |grep curses
	libncursesw.so.5 => /lib/libncursesw.so.5 (0x00375000)

libncursesw has a much better support of Unicode than libncurses.

Since Python 3.3, the Python curses.window.addstr() method uses waddwstr() when the module is linked to libncursesw, which also improves the Unicode support.
msg190501 - (view) Author: (helmut) Date: 2013-06-02 20:22
All reproducers confirmed that their _cursessomething.so is linked against libncursesw.so.5.
msg190503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-02 20:45
u"äöü" encoded to "utf-8" gives '\xc3\xa4\xc3\xb6\xc3\xbc'

"\303\303\303\274" is '\xc3\xc3\xc3\xbc'.

I guess that curses considers that '\xc3\xa4' is a string of 2 characters: screen.addstr(0, 1, "ö") replaces the second "character", '\xa4'.

I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works.

If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3.
msg190519 - (view) Author: (helmut) Date: 2013-06-03 06:03
> I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works.

It works as in "the output looks as the one expected". Long lines with utf8 characters will make it break again though.

screen.addstr(0, 0, "äöü" * 20) # assuming COLUMNS=80

Will give two rows of characters of which the first row is 40 characters long.

> If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3.

Sounds sensible. Are you aware of a workaround for this issue? I.e. is there any way to force Python2.7 to use the wide mode for outputting characters?
msg190521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-03 07:40
"Sounds sensible. Are you aware of a workaround for this issue? I.e.
is there any way to force Python2.7 to use the wide mode for
outputting characters?"

I don't think that it is possible to workaround this issue, it is a
bug in the design of curses, related to Unicode. I suppose that
libncursesw uses an array of wchar_t characters when the *_wch() and
*wstr() functions are used, whereas your version looks to use an array
of char* characters and so is unable to understand that a character is
composed of two bytes (ex: b"\xc3\xa4" for u"ä").
History
Date User Action Args
2022-04-11 14:57:46adminsetgithub: 62318
2013-06-03 07:40:39vstinnersetmessages: + msg190521
2013-06-03 06:03:34helmutsetmessages: + msg190519
2013-06-02 20:45:42vstinnersetstatus: open -> closed
resolution: wont fix
messages: + msg190503
2013-06-02 20:22:27helmutsetmessages: + msg190501
2013-06-02 20:12:17vstinnersetmessages: + msg190500
2013-06-02 11:08:10r.david.murraysetnosy: + vstinner, r.david.murray

messages: + msg190483
title: curses utf8 output broken -> curses utf8 output broken in Python2
2013-06-02 10:14:02helmutcreate