classification
Title: curses implementation of Unicode is wrong in Python 3
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Nicholas.Cole, akuchling, cben, eric.araujo, gpolo, haypo, inigoserna, jcea, john.feuerstein, nadeem.vawda, ned.deily, petri.lehtinen, pitrou, python-dev, r.david.murray, schodet, zeha
Priority: normal Keywords: patch

Created on 2011-07-14 22:33 by haypo, last changed 2012-06-21 20:13 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
getkey.patch haypo, 2011-07-14 23:09 review
curses_unicode.patch haypo, 2011-07-19 00:19 review
Messages (37)
msg140375 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-14 22:33
curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue #6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice.

Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python:

    for byte in 'é'.encode('utf-8'):
        win.addch(byte)

I see two possible solutions:

A) Add a new functions only accepting characters, and not accept characters in the existing functions

B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233).

I prefer solution (B) because addch('é') would just work as expected.
msg140379 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-14 23:09
getkey.patch fixes window.getkey(): use get_wch() instead of getch() to handle correctly non-ASCII characters. I tested with the key é (U+00E9) with ISO-8859-1 and UTF-8 locale encoding: getkey() gives the expected result (but addstr is unable to display it, because addstr encodes the string to UTF-8 instead of the locale encoding).
msg140405 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-15 13:33
Oh, by the way: do all platforms have wide character functions? I don't see any failure on our Python 3.x buildbots, but test_curses is skipped on many buildbots.
msg140406 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011-07-15 13:56
I think that some platforms do not have wide character support, though I could be wrong.  The FAQ here: http://invisible-island.net/ncurses/ncurses.faq.html  has a list of those that do and those that don't, but I don't know how up to date it is.
msg140411 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-15 14:39
> by the way: do all platforms have wide character functions?

See msg140408 and msg140409: Antoine Pitrou (OS=Mageia 1) and some buildbots don't have get_wch().
msg140637 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-19 00:19
Patch the _curses module to improve Unicode support:

 - add an encoding attribute to a window (only visible in C): read the locale encoding
 - encode a character and a character string to the window encoding if the ncursesw library is NOT used
 - addch(), addstr(), addnstr(), insstr() and insnstr() use the wide character functions if the ncursesw library is used
 - PyCurses_ConvertToChtype() checks for integer overflow and rejects values outside [0; 255]

The check on the ncursesw library availability is done in setup.py because the library linked to _curses depends on the readline library (see issues #7384 and #9408).

I don't know if wide character functions can be available in curses or ncurses library.

Details:

 - locale encoding: use GetConsoleOutputCP() on Windows, nl_langinfo(CODESET) if available, or "utf-8"
 - don't encode a character to the window encoding if its code is in [0; 127] (use the Unicode point code): all encoding are compatible with ASCII... except some encodings like JIS X 0201. In JIS, 0x5C is decoded to the yen sign (U+00A5) instead of a backslash (U+005C).
 - if an encoded character is longer than 1 byte, raise a OverflowError. For example, U+00E9 (é) encoded to UTF-8 gives b'\xC3\xA9' (two bytes).
 - copy the encoding when creating a subwindow.
 - use a global variable, screen_encoding, in PyCurses_UnCtrl() and PyCurses_UngetCh()

It's not possible to specify an encoding.

GetConsoleOutputCP() is maybe not the right code on Windows if a text application doesn't run in a Windows console (e.g. if it uses its own terminal emulator). GetOEMCP() is maybe a better choice, or a function should be added to specify the encoding used by the _curses module (override the "locale encoding").

If a function is added to specify the encoding, I think that it is better to add a global function instead of adding an argument to functions creating a new window object (initscr(), getwin(), subwin(), derwin(), newpad()).
msg140638 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-19 00:26
Using curses_unicode.patch:

 - without ncursesw: addch('é') raises an OverflowError because 'é'.encode('UTF-8') is 2 bytes and not 1 byte
 - with ncursesw: the charset is displayable character depends on the locale encoding (e.g. € cannot be printed with ISO-8859-1 locale encoding)
 - with ncursesw: any character can be printed with a UTF-8 locale encoding (including non-BMP characters: U-10000..U+10FFFF)

It would be possible to support multibyte encoded character (like é in UTF-8) for addch() by calling addch() multiple times, one per byte, but I would prefer to keep _curses simple and not workaround libncurses limitations (bugs).
msg140639 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-19 00:28
See also #6755 (curses.get_wch).
msg141462 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-31 13:06
New changeset d98b5e0f0862 by Nadeem Vawda in branch 'default':
Fix build error in _curses module when not using libncursesw.
http://hg.python.org/cpython/rev/d98b5e0f0862
msg141465 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-07-31 13:18
Following d98b5e0f0862, I have been able to successfully build the curses
module with curses_unicode.patch applied.
msg141466 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-07-31 13:19
Ack sorry, forgot to give context - my machine doesn't have libncursesw,
so the curses module failed to build before that commit (with or without
the patch applied).
msg141771 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-08-08 11:22
See also #10570.
msg142283 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011-08-17 15:49
There are now several bugs dealing with related issues here.  Are we any closer to a solution to any of them?  The suggested patches look like a good idea - what needs to happen for them to move forward?
msg142289 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-08-17 18:35
> what needs to happen for them to move forward?

I would like a review of curses_unicode.patch.
msg143574 - (view) Author: Roundup Robot (python-dev) Date: 2011-09-05 23:53
New changeset b1e03d10391e by Victor Stinner in branch 'default':
Issue #12567: Add curses.unget_wch() function
http://hg.python.org/cpython/rev/b1e03d10391e
msg143575 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-06 00:05
I'm not sure that it is correct to call nl_langinfo(CODESET) to get the locale encoding. The LC_CTYPE locale should maybe be set temporary to the current locale (""), as does locale.getpreferredencoding(). Or maybe better, locale.getpreferredencoding() should be called.
msg143576 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-09-06 00:06
> The LC_CTYPE locale should maybe be set temporary to
> the current locale (""), as does locale.getpreferredencoding().

See also issue #6203.
msg143589 - (view) Author: Roundup Robot (python-dev) Date: 2011-09-06 08:08
New changeset 786668a4fb6b by Victor Stinner in branch 'default':
Issue #12567: Fix curses.unget_wch() tests
http://hg.python.org/cpython/rev/786668a4fb6b
msg148361 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-25 21:08
New changeset c3581ca21a57 by Victor Stinner in branch 'default':
Issue #12567: The curses module uses Unicode functions for Unicode arguments
http://hg.python.org/cpython/rev/c3581ca21a57
msg148365 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 22:38
This broke several Gentoo buildbots.
msg148429 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-26 23:21
New changeset 919259054621 by Victor Stinner in branch 'default':
Issue #13415: Help to locate curses.h when _curses module is linked to ncursesw
http://hg.python.org/cpython/rev/919259054621

(Oops, wrong issue number, again)
msg148430 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-26 23:26
> This broke several Gentoo buildbots.

setup.py is unable to locate correctly curses.h. I added a hack to always search in /usr/include/ncursesw/. The hack is needed on Ubuntu 11.10 if you only have libncursesw5-dev but not libncursesw-dev for example.
msg148452 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-11-27 15:04
I am still concerned about the compilation warning in OpenIndiana buildbots :-(
msg148468 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-28 06:31
Compile output on OpenSolaris:

Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/readline.so
collect2: ld returned 1 exit status
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:279: error: expected declaration specifiers or '...' before 'cchar_t'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_ConvertToCchar_t':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: 'wch' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: (Each undeclared identifier is reported only once
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: for each function it appears in.)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddCh':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: 'cchar_t' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: expected ';' before 'wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: 'wch' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: too many arguments to function 'PyCurses_ConvertToCchar_t'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:623: warning: implicit declaration of function 'mvwadd_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:625: warning: implicit declaration of function 'wadd_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:702: warning: implicit declaration of function 'mvwaddwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:704: warning: implicit declaration of function 'waddwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddNStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:779: warning: implicit declaration of function 'mvwaddnwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:781: warning: implicit declaration of function 'waddnwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1187: warning: implicit declaration of function 'wget_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1194: warning: implicit declaration of function 'mvwget_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1468: warning: implicit declaration of function 'mvwins_wstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1470: warning: implicit declaration of function 'wins_wstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsNStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1546: warning: implicit declaration of function 'mvwins_nwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1548: warning: implicit declaration of function 'wins_nwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_Unget_Wch':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:3130: warning: implicit declaration of function 'unget_wch'
ld: fatal: file /usr/local/lib/libpanelw.so: wrong ELF class: ELFCLASS32
ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/_curses_panel.so
collect2: ld returned 1 exit status
msg148469 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-28 06:33
New changeset bf51e32b2a81 by Victor Stinner in branch 'default':
Issue #13415: test_curses skips unencodable characters
http://hg.python.org/cpython/rev/bf51e32b2a81

(Oops, I copy-pasted the issue number from my previous commit, and the issue number was wrong...)
msg149008 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-08 00:53
> I am still concerned about the compilation warning in OpenIndiana buildbots :-(

I'm unable to reproduce the issue in my OpenIndiana VM: the compilaton of the _curses module fail, not because of Unicode, but because mvwchgat() function is missing => see the issue #3786. I don't know how to install ncursesw on OpenIndiana, I didn't find an official package using pkg search.

curses issues on OpenIndiana are serious enough to have their own issue: I opened the issue #13552.
msg149012 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-08 01:07
The code has been commited. The remaining task is to fix OpenIndiana issues: see #13552.
msg149110 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-09 17:16
Victor, I have these notes I wrote down when I set up the OpenIndiana buildbots. Maybe can be useful to you: (compiling from source)

"""
  * ncurses 5.7: Instalación estándar "./configure --with-shared --without-normal --enable-widec --without-cxx-binding". Al curses que viene con OpenIndiana le faltan un par de funciones: "mvwchgat" y "wchgat".

"""

I installed ncurses because the lack of "mvwchgat" and "wchgat".

When compiling Python, I add export "CFLAGS=-I/usr/local/include/ncursesw" to help it to find the right lib.

Hope to be useful.
msg149111 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-09 17:31
> I wrote down when I set up the OpenIndiana buildbots

Hum, please use the issue #13552 for curses issues on OpenIndiana/Solaris.

> ... de funciones: "mvwchgat" y "wchgat"

See issues #3786 and  #13552 for this problem.

> I installed ncurses ... I add export "CFLAGS=-I/usr/local/include/ncursesw"

The curses module is compiled by setup.py, not Makefile. It looks that setup.py ignores CFLAGS. I don't know if setup.py permits to specify such option.
msg154477 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012-02-27 13:04
It looks to me as if the documentation in the release candidates for 2.7.3 and 3.2.3 haven't been updated to include the new curses fixes.  Is that correct?
msg154478 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-27 13:13
Yes, it was only fixed for 3.3.
msg157627 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012-04-05 21:45
Testing the Python3.3a2 build on OS X - the exception 


AttributeError: '_curses.curses window' object has no attribute 'get_wch'

is still being raised. I don't have a Linux build I can easily test with. Is this a particular problem with the OS X build?
msg157628 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-04-05 21:48
> AttributeError: '_curses.curses window' object has no attribute 'get_wch'

> is still being raised.

"still"? Did it work before my last changes?

Unicode functions of the (n)curses library are only available if the Python curses module is linked to libncursesw.

Is libncursesw available? Is libreadline linked to libncurses or libncursesw? If libreadline is linked to libncurses, the Python curses module is also linked to libncurses.
msg157636 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-04-06 02:28
Nicholas, please open a new issue documenting which Python 3.3 you are using, from which python.org installer or the ./configure parameters if you built it yourself (and whether you supplied a version of GNU readline or used the Apple default of BSD libedit) and an example of how to reproduce the error.  Please don't add to closed issues.  Note also there is a known open issue with the 32-bit-only OS X installer for 3.3 where the _curses module does not build (Issue14225) with an older version of GNU readline.
msg163306 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-21 06:48
New changeset 2035c5ad4239 by Ned Deily in branch 'default':
Issue #14225: Fix Unicode support for curses (#12567) on OS X:
http://hg.python.org/cpython/rev/2035c5ad4239
msg163308 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-06-21 07:10
It turns out that the Unicode support introduced by this issue didn't build correctly on OS X, either silently failing to build (explaining the problem seen by Nicholas) or causing a compile error (as seen in Issue14225).  This should be working OK (as of 3.3.0b1).

BTW, a test of the wide char functions would be nice and might have caught this.
msg163362 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012-06-21 20:13
See also Issue15037 which documents a broken curses.unget_wch and, hence, test_curses when Python is built with ncurses 5.7 or earlier.
History
Date User Action Args
2012-06-21 20:13:09ned.deilysetmessages: + msg163362
2012-06-21 07:10:52ned.deilysetmessages: + msg163308
2012-06-21 06:48:40python-devsetmessages: + msg163306
2012-04-06 02:28:27ned.deilysetnosy: + ned.deily
messages: + msg157636
2012-04-05 21:48:23hayposetmessages: + msg157628
2012-04-05 21:45:44Nicholas.Colesetmessages: + msg157627
2012-02-27 13:13:24eric.araujosetnosy: + eric.araujo

messages: + msg154478
stage: resolved
2012-02-27 13:04:57Nicholas.Colesetmessages: + msg154477
2011-12-09 17:31:45hayposetmessages: + msg149111
2011-12-09 17:16:19jceasetmessages: + msg149110
2011-12-08 01:07:21hayposetstatus: open -> closed
resolution: fixed
2011-12-08 01:07:13hayposetmessages: + msg149012
2011-12-08 00:53:27hayposetmessages: + msg149008
2011-11-28 06:33:47hayposetmessages: + msg148469
2011-11-28 06:31:05hayposetmessages: + msg148468
2011-11-27 15:04:29jceasetmessages: + msg148452
2011-11-26 23:26:08hayposetmessages: + msg148430
2011-11-26 23:21:19hayposetmessages: + msg148429
2011-11-25 22:38:22pitrousetnosy: + pitrou
messages: + msg148365
2011-11-25 21:08:33python-devsetmessages: + msg148361
2011-11-07 13:01:33john.feuersteinsetnosy: + john.feuerstein
2011-10-28 08:17:40petri.lehtinensetnosy: + petri.lehtinen
2011-09-09 19:57:35jceasetnosy: + jcea
2011-09-06 08:08:41python-devsetmessages: + msg143589
2011-09-06 00:06:34hayposetmessages: + msg143576
2011-09-06 00:05:08hayposetmessages: + msg143575
2011-09-05 23:53:32python-devsetmessages: + msg143574
2011-08-17 18:35:00hayposetmessages: + msg142289
2011-08-17 15:49:38Nicholas.Colesetmessages: + msg142283
2011-08-08 11:22:09hayposetmessages: + msg141771
2011-07-31 13:19:27nadeem.vawdasetmessages: + msg141466
2011-07-31 13:18:34nadeem.vawdasetnosy: + nadeem.vawda
messages: + msg141465
2011-07-31 13:06:29python-devsetmessages: + msg141462
2011-07-19 00:28:16hayposetmessages: + msg140639
2011-07-19 00:26:28hayposetmessages: + msg140638
2011-07-19 00:19:51hayposetfiles: + curses_unicode.patch

messages: + msg140637
2011-07-15 14:39:08hayposetmessages: + msg140411
2011-07-15 13:56:47Nicholas.Colesetmessages: + msg140406
2011-07-15 13:33:55hayposetmessages: + msg140405
2011-07-15 04:21:16Arfreversetnosy: + Arfrever
2011-07-14 23:09:33hayposetfiles: + getkey.patch
keywords: + patch
messages: + msg140379
2011-07-14 22:33:36haypocreate