Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curses implementation of Unicode is wrong in Python 3 #56776

Closed
vstinner opened this issue Jul 14, 2011 · 37 comments
Closed

curses implementation of Unicode is wrong in Python 3 #56776

vstinner opened this issue Jul 14, 2011 · 37 comments
Labels
stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

BPO 12567
Nosy @akuchling, @jcea, @cben, @pitrou, @vstinner, @ned-deily, @merwok, @bitdancer, @akheron
Files
  • getkey.patch
  • curses_unicode.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-12-08.01:07:21.972>
    created_at = <Date 2011-07-14.22:33:36.493>
    labels = ['library']
    title = 'curses implementation of Unicode is wrong in Python 3'
    updated_at = <Date 2012-06-21.20:13:09.522>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-06-21.20:13:09.522>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-12-08.01:07:21.972>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2011-07-14.22:33:36.493>
    creator = 'vstinner'
    dependencies = []
    files = ['22662', '22692']
    hgrepos = []
    issue_num = 12567
    keywords = ['patch']
    message_count = 37.0
    messages = ['140375', '140379', '140405', '140406', '140411', '140637', '140638', '140639', '141462', '141465', '141466', '141771', '142283', '142289', '143574', '143575', '143576', '143589', '148361', '148365', '148429', '148430', '148452', '148468', '148469', '149008', '149012', '149110', '149111', '154477', '154478', '157627', '157628', '157636', '163306', '163308', '163362']
    nosy_count = 18.0
    nosy_names = ['akuchling', 'jcea', 'cben', 'pitrou', 'vstinner', 'nadeem.vawda', 'gpolo', 'ned.deily', 'eric.araujo', 'Arfrever', 'r.david.murray', 'inigoserna', 'zeha', 'schodet', 'python-dev', 'petri.lehtinen', 'Nicholas.Cole', 'john.feuerstein']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue12567'
    versions = ['Python 3.3']

    @vstinner
    Copy link
    Member Author

    curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue bpo-6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice.

    Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python:

        for byte in 'é'.encode('utf-8'):
            win.addch(byte)

    I see two possible solutions:

    A) Add a new functions only accepting characters, and not accept characters in the existing functions

    B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233).

    I prefer solution (B) because addch('é') would just work as expected.

    @vstinner vstinner added the stdlib Python modules in the Lib dir label Jul 14, 2011
    @vstinner
    Copy link
    Member Author

    getkey.patch fixes window.getkey(): use get_wch() instead of getch() to handle correctly non-ASCII characters. I tested with the key é (U+00E9) with ISO-8859-1 and UTF-8 locale encoding: getkey() gives the expected result (but addstr is unable to display it, because addstr encodes the string to UTF-8 instead of the locale encoding).

    @vstinner
    Copy link
    Member Author

    Oh, by the way: do all platforms have wide character functions? I don't see any failure on our Python 3.x buildbots, but test_curses is skipped on many buildbots.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Jul 15, 2011

    I think that some platforms do not have wide character support, though I could be wrong. The FAQ here: http://invisible-island.net/ncurses/ncurses.faq.html has a list of those that do and those that don't, but I don't know how up to date it is.

    @vstinner
    Copy link
    Member Author

    by the way: do all platforms have wide character functions?

    See msg140408 and msg140409: Antoine Pitrou (OS=Mageia 1) and some buildbots don't have get_wch().

    @vstinner
    Copy link
    Member Author

    Patch the _curses module to improve Unicode support:

    • add an encoding attribute to a window (only visible in C): read the locale encoding
    • encode a character and a character string to the window encoding if the ncursesw library is NOT used
    • addch(), addstr(), addnstr(), insstr() and insnstr() use the wide character functions if the ncursesw library is used
    • PyCurses_ConvertToChtype() checks for integer overflow and rejects values outside [0; 255]

    The check on the ncursesw library availability is done in setup.py because the library linked to _curses depends on the readline library (see issues bpo-7384 and bpo-9408).

    I don't know if wide character functions can be available in curses or ncurses library.

    Details:

    • locale encoding: use GetConsoleOutputCP() on Windows, nl_langinfo(CODESET) if available, or "utf-8"
    • don't encode a character to the window encoding if its code is in [0; 127] (use the Unicode point code): all encoding are compatible with ASCII... except some encodings like JIS X 0201. In JIS, 0x5C is decoded to the yen sign (U+00A5) instead of a backslash (U+005C).
    • if an encoded character is longer than 1 byte, raise a OverflowError. For example, U+00E9 (é) encoded to UTF-8 gives b'\xC3\xA9' (two bytes).
    • copy the encoding when creating a subwindow.
    • use a global variable, screen_encoding, in PyCurses_UnCtrl() and PyCurses_UngetCh()

    It's not possible to specify an encoding.

    GetConsoleOutputCP() is maybe not the right code on Windows if a text application doesn't run in a Windows console (e.g. if it uses its own terminal emulator). GetOEMCP() is maybe a better choice, or a function should be added to specify the encoding used by the _curses module (override the "locale encoding").

    If a function is added to specify the encoding, I think that it is better to add a global function instead of adding an argument to functions creating a new window object (initscr(), getwin(), subwin(), derwin(), newpad()).

    @vstinner
    Copy link
    Member Author

    Using curses_unicode.patch:

    • without ncursesw: addch('é') raises an OverflowError because 'é'.encode('UTF-8') is 2 bytes and not 1 byte
    • with ncursesw: the charset is displayable character depends on the locale encoding (e.g. € cannot be printed with ISO-8859-1 locale encoding)
    • with ncursesw: any character can be printed with a UTF-8 locale encoding (including non-BMP characters: U-10000..U+10FFFF)

    It would be possible to support multibyte encoded character (like é in UTF-8) for addch() by calling addch() multiple times, one per byte, but I would prefer to keep _curses simple and not workaround libncurses limitations (bugs).

    @vstinner
    Copy link
    Member Author

    See also bpo-6755 (curses.get_wch).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 31, 2011

    New changeset d98b5e0f0862 by Nadeem Vawda in branch 'default':
    Fix build error in _curses module when not using libncursesw.
    http://hg.python.org/cpython/rev/d98b5e0f0862

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jul 31, 2011

    Following d98b5e0f0862, I have been able to successfully build the curses
    module with curses_unicode.patch applied.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jul 31, 2011

    Ack sorry, forgot to give context - my machine doesn't have libncursesw,
    so the curses module failed to build before that commit (with or without
    the patch applied).

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 8, 2011

    See also bpo-10570.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Aug 17, 2011

    There are now several bugs dealing with related issues here. Are we any closer to a solution to any of them? The suggested patches look like a good idea - what needs to happen for them to move forward?

    @vstinner
    Copy link
    Member Author

    what needs to happen for them to move forward?

    I would like a review of curses_unicode.patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 5, 2011

    New changeset b1e03d10391e by Victor Stinner in branch 'default':
    Issue bpo-12567: Add curses.unget_wch() function
    http://hg.python.org/cpython/rev/b1e03d10391e

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2011

    I'm not sure that it is correct to call nl_langinfo(CODESET) to get the locale encoding. The LC_CTYPE locale should maybe be set temporary to the current locale (""), as does locale.getpreferredencoding(). Or maybe better, locale.getpreferredencoding() should be called.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2011

    The LC_CTYPE locale should maybe be set temporary to
    the current locale (""), as does locale.getpreferredencoding().

    See also issue bpo-6203.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 6, 2011

    New changeset 786668a4fb6b by Victor Stinner in branch 'default':
    Issue bpo-12567: Fix curses.unget_wch() tests
    http://hg.python.org/cpython/rev/786668a4fb6b

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 25, 2011

    New changeset c3581ca21a57 by Victor Stinner in branch 'default':
    Issue bpo-12567: The curses module uses Unicode functions for Unicode arguments
    http://hg.python.org/cpython/rev/c3581ca21a57

    @pitrou
    Copy link
    Member

    pitrou commented Nov 25, 2011

    This broke several Gentoo buildbots.

    @vstinner
    Copy link
    Member Author

    New changeset 919259054621 by Victor Stinner in branch 'default':
    Issue bpo-13415: Help to locate curses.h when _curses module is linked to ncursesw
    http://hg.python.org/cpython/rev/919259054621

    (Oops, wrong issue number, again)

    @vstinner
    Copy link
    Member Author

    This broke several Gentoo buildbots.

    setup.py is unable to locate correctly curses.h. I added a hack to always search in /usr/include/ncursesw/. The hack is needed on Ubuntu 11.10 if you only have libncursesw5-dev but not libncursesw-dev for example.

    @jcea
    Copy link
    Member

    jcea commented Nov 27, 2011

    I am still concerned about the compilation warning in OpenIndiana buildbots :-(

    @vstinner
    Copy link
    Member Author

    Compile output on OpenSolaris:

    Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
    ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
    ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/readline.so
    collect2: ld returned 1 exit status
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:279: error: expected declaration specifiers or '...' before 'cchar_t'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_ConvertToCchar_t':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: 'wch' undeclared (first use in this function)
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: (Each undeclared identifier is reported only once
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: for each function it appears in.)
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddCh':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: 'cchar_t' undeclared (first use in this function)
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: expected ';' before 'wch'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: 'wch' undeclared (first use in this function)
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: too many arguments to function 'PyCurses_ConvertToCchar_t'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:623: warning: implicit declaration of function 'mvwadd_wch'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:625: warning: implicit declaration of function 'wadd_wch'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddStr':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:702: warning: implicit declaration of function 'mvwaddwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:704: warning: implicit declaration of function 'waddwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddNStr':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:779: warning: implicit declaration of function 'mvwaddnwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:781: warning: implicit declaration of function 'waddnwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1187: warning: implicit declaration of function 'wget_wch'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1194: warning: implicit declaration of function 'mvwget_wch'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsStr':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1468: warning: implicit declaration of function 'mvwins_wstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1470: warning: implicit declaration of function 'wins_wstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsNStr':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1546: warning: implicit declaration of function 'mvwins_nwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1548: warning: implicit declaration of function 'wins_nwstr'
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_Unget_Wch':
    /export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:3130: warning: implicit declaration of function 'unget_wch'
    ld: fatal: file /usr/local/lib/libpanelw.so: wrong ELF class: ELFCLASS32
    ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
    ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/_curses_panel.so
    collect2: ld returned 1 exit status

    @vstinner
    Copy link
    Member Author

    New changeset bf51e32b2a81 by Victor Stinner in branch 'default':
    Issue bpo-13415: test_curses skips unencodable characters
    http://hg.python.org/cpython/rev/bf51e32b2a81

    (Oops, I copy-pasted the issue number from my previous commit, and the issue number was wrong...)

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 8, 2011

    I am still concerned about the compilation warning in OpenIndiana buildbots :-(

    I'm unable to reproduce the issue in my OpenIndiana VM: the compilaton of the _curses module fail, not because of Unicode, but because mvwchgat() function is missing => see the issue bpo-3786. I don't know how to install ncursesw on OpenIndiana, I didn't find an official package using pkg search.

    curses issues on OpenIndiana are serious enough to have their own issue: I opened the issue bpo-13552.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 8, 2011

    The code has been commited. The remaining task is to fix OpenIndiana issues: see bpo-13552.

    @vstinner vstinner closed this as completed Dec 8, 2011
    @jcea
    Copy link
    Member

    jcea commented Dec 9, 2011

    Victor, I have these notes I wrote down when I set up the OpenIndiana buildbots. Maybe can be useful to you: (compiling from source)

    """

    • ncurses 5.7: Instalación estándar "./configure --with-shared --without-normal --enable-widec --without-cxx-binding". Al curses que viene con OpenIndiana le faltan un par de funciones: "mvwchgat" y "wchgat".

    """

    I installed ncurses because the lack of "mvwchgat" and "wchgat".

    When compiling Python, I add export "CFLAGS=-I/usr/local/include/ncursesw" to help it to find the right lib.

    Hope to be useful.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 9, 2011

    I wrote down when I set up the OpenIndiana buildbots

    Hum, please use the issue bpo-13552 for curses issues on OpenIndiana/Solaris.

    ... de funciones: "mvwchgat" y "wchgat"

    See issues bpo-3786 and bpo-13552 for this problem.

    I installed ncurses ... I add export "CFLAGS=-I/usr/local/include/ncursesw"

    The curses module is compiled by setup.py, not Makefile. It looks that setup.py ignores CFLAGS. I don't know if setup.py permits to specify such option.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Feb 27, 2012

    It looks to me as if the documentation in the release candidates for 2.7.3 and 3.2.3 haven't been updated to include the new curses fixes. Is that correct?

    @merwok
    Copy link
    Member

    merwok commented Feb 27, 2012

    Yes, it was only fixed for 3.3.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Apr 5, 2012

    Testing the Python3.3a2 build on OS X - the exception

    AttributeError: '_curses.curses window' object has no attribute 'get_wch'

    is still being raised. I don't have a Linux build I can easily test with. Is this a particular problem with the OS X build?

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 5, 2012

    AttributeError: '_curses.curses window' object has no attribute 'get_wch'

    is still being raised.

    "still"? Did it work before my last changes?

    Unicode functions of the (n)curses library are only available if the Python curses module is linked to libncursesw.

    Is libncursesw available? Is libreadline linked to libncurses or libncursesw? If libreadline is linked to libncurses, the Python curses module is also linked to libncurses.

    @ned-deily
    Copy link
    Member

    Nicholas, please open a new issue documenting which Python 3.3 you are using, from which python.org installer or the ./configure parameters if you built it yourself (and whether you supplied a version of GNU readline or used the Apple default of BSD libedit) and an example of how to reproduce the error. Please don't add to closed issues. Note also there is a known open issue with the 32-bit-only OS X installer for 3.3 where the _curses module does not build (bpo-14225) with an older version of GNU readline.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 21, 2012

    New changeset 2035c5ad4239 by Ned Deily in branch 'default':
    Issue bpo-14225: Fix Unicode support for curses (bpo-12567) on OS X:
    http://hg.python.org/cpython/rev/2035c5ad4239

    @ned-deily
    Copy link
    Member

    It turns out that the Unicode support introduced by this issue didn't build correctly on OS X, either silently failing to build (explaining the problem seen by Nicholas) or causing a compile error (as seen in bpo-14225). This should be working OK (as of 3.3.0b1).

    BTW, a test of the wide char functions would be nice and might have caught this.

    @ned-deily
    Copy link
    Member

    See also bpo-15037 which documents a broken curses.unget_wch and, hence, test_curses when Python is built with ncurses 5.7 or earlier.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants