Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

Closed
inigoserna mannequin opened this issue Aug 21, 2009 · 30 comments
Closed

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

inigoserna mannequin opened this issue Aug 21, 2009 · 30 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@inigoserna
Copy link
Mannequin

inigoserna mannequin commented Aug 21, 2009

BPO 6755
Nosy @akuchling, @jcea, @cben, @pitrou, @vstinner, @bitdancer
Files
  • curses.get_wch.patch: Patch for the documentation
  • test_get_wch.py: Test example
  • _cursesmodule.get_wch.patch: Patch against Python 2.6.2 _cursesmodule.c
  • _cursesmodule.311.get_wch.patch: Patch against Python 3.1.1 _cursesmodule.c
  • test_ucs2w.py: Several implementations of wcwidth() and wcswidth()
  • ucs2w.c: C extension implementation of wcwidth() and wcswidth()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-03-08.01:18:55.099>
    created_at = <Date 2009-08-21.11:43:30.490>
    labels = ['extension-modules', 'type-feature']
    title = 'Patch: new method get_wch for ncurses bindings: accept wide characters (unicode)'
    updated_at = <Date 2012-03-08.01:18:55.098>
    user = 'https://bugs.python.org/inigoserna'

    bugs.python.org fields:

    activity = <Date 2012-03-08.01:18:55.098>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-03-08.01:18:55.099>
    closer = 'vstinner'
    components = ['Extension Modules']
    creation = <Date 2009-08-21.11:43:30.490>
    creator = 'inigoserna'
    dependencies = []
    files = ['14753', '14754', '14755', '14756', '14782', '14783']
    hgrepos = []
    issue_num = 6755
    keywords = ['patch']
    message_count = 30.0
    messages = ['91816', '91817', '91818', '91819', '91821', '91867', '91868', '91956', '91973', '91974', '91975', '91982', '108249', '139333', '139335', '140371', '140374', '140378', '140403', '140404', '140408', '140409', '140410', '140412', '140640', '143798', '155052', '155054', '155119', '155146']
    nosy_count = 13.0
    nosy_names = ['akuchling', 'jcea', 'cben', 'pitrou', 'vstinner', 'gpolo', 'r.david.murray', 'inigoserna', 'phep', 'zeha', 'schodet', 'python-dev', 'Nicholas.Cole']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue6755'
    versions = ['Python 3.3']

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 21, 2009

    Currently,there is no a simple way in curses bindings to get the code
    associated with a key press of non ascii keystroke (f.e. ç) in terminals
    configured with UTF-8 encoding.

    getch returns the code for a wide character byte a byte.
    But ncurses library has a proper function to do it: get_wch.

    Patch against Python v2.6.2 to provide this missing get_wch method
    in the ncurses bindings.

    Include a test example and a patch to the documentation as well.

    More info and a partial solution without patching python curses module
    on this thread:
    http://groups.google.com/group/comp.lang.python/browse_thread/thread/67dce30f0a2742a6?fwc=2

    @inigoserna inigoserna mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Aug 21, 2009
    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 21, 2009

    Added patch for the documentation

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 21, 2009

    Added test example

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 21, 2009

    Added missing file: patch against Python v2.6.2

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 21, 2009

    Added patch against Python v3.1.1.
    NOT TESTED!

    @gpolo
    Copy link
    Mannequin

    gpolo mannequin commented Aug 22, 2009

    Have you looked into bpo-700921 already ? It seems a lot of discussion
    was generated there, but no patches.

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 22, 2009

    Thanks for the pointer, haven't seen anything when I searched for get_wch.

    The patch provided here only adds this get_wch function, because as A.M.
    Kuchling explained in bpo-700921, it's possible to use wide chars now,
    the only feature missing is get_wch.

    @cben
    Copy link
    Mannequin

    cben mannequin commented Aug 25, 2009

    Nice. 2 questions:

    1. Why not change getch() to always use get_wch()?
    2. I think you also want fix getkey() / introduce get_wkey().

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 26, 2009

    Q. Why not change getch() to always use get_wch()?

    This could break backwards compatibility.
    There are some code out there that may use getch() to get the bytes
    stream one by one and build the wide char.
    In fact I'm using this trick to get unicode chars by now.
    Look the thread link in first comment to find the implementation I've
    developed for my app. Other people are using similar approaches too.

    Q. I think you also want fix getkey() / introduce get_wkey().

    In my own experience get_wkey isn't be as useful when dealing with wide
    chars. But, of course, that's only my use cases.

    @inigoserna
    Copy link
    Mannequin Author

    inigoserna mannequin commented Aug 26, 2009

    Btw, I don't know if this is the best place to comment it but as it is
    somehow related with ncurses...

    Other functions I miss a lot are wcwidth() and wcswidth().

    These functions return the real width (read, cells length in screen) for
    unicode strings.

    An example to clarify the issue: one simple Chinese character could need
    2 cells on screen, thus len(chinese_unicode_string) won't return the
    real screen width needed to show the string.

    i.e., len(chinese_unicode_string) != wcswidth(chinese_unicode_string)

    Those functions are included into not so old glibc versions (2.2+?), at
    least on my Linux systems.

    Sadly enough, python doesn't bind them, afaik.
    I've tried ctypes but don't work for me (don't know the reason), so I've
    written some replacements.

    Please look at these files:

    • test_ucs2w.py: benchmarks to different implementations. Most of them,
      pure python. Please consider only ucs2w_1x, other are only experiments.

    • ucs2w.c: C extension implementation

    I think Python could benefit from having these functions in the standard
    library. Surely, most simple way should be to bind glibc functions, but
    don't know if they exist on other platforms such as MacOS X or Windows.

    Neither do I know where they fit... perhaps in unicodedata module.

    What do you think? who is the person to convince? (please, don't ask me
    to write a PEP, my English is not good enough).

    @bitdancer
    Copy link
    Member

    For the title concern of this patch I'm adding akuchling as nosy.

    Judging by your post your English probably is good enough to write a PEP
    (the PEP editors should help with fine tuning it, at least in theory).
    However, I doubt a PEP would be necessary.

    As for where to raise the (new) issue...given that these functions are
    independent of curses, I'd say open a new issue. If you do that, I'd
    suggest making lemburg nosy on the issue, as he is the original author
    of the unicodedata module and many other unicode things in python.

    I'm setting the stage to test needed since your test case isn't a unit
    test. I have no idea (never having worked with curses) how hard a unit
    test would be to write. Nor have I reviewed the patch, for the same reason.

    @vstinner
    Copy link
    Member

    inigoserna>> Other functions I miss a lot are wcwidth() and wcswidth()

    I wrote a patch to implement unicode.width() method:
    http://bugs.python.org/file13357/unicode_width.patch

    It's part of the issue bpo-2382 (SyntaxError cursor shifted if multibyte
    character is in line)

    @zeha
    Copy link
    Mannequin

    zeha mannequin commented Jun 20, 2010

    Will this be part of 3.2 and possibly 2.7?

    Without these patches wide character input using curses is basically impossible (on at least some platforms).

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Jun 27, 2011

    Is there any hope that something like this patch will make it into a future version? As far as I can see, entering accented characters is currently impossible on the latest release versions of python...or am I missing something?

    @vstinner
    Copy link
    Member

    Can someone update the patch for Python 3.3? Python 2.7 and 3.2 don't accept new features.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Jul 14, 2011

    The bug is marked "Test Needed".

    I am very keen to see this issue fixed, and would be very willing to help, but I don't really know what is still required. As far as I can see there is a patch waiting - what is the hold up?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 14, 2011

    New changeset dec10ad41b2a by Victor Stinner in branch 'default':
    Close bpo-6755: Add get_wch() method to curses.window class
    http://hg.python.org/cpython/rev/dec10ad41b2a

    @python-dev python-dev mannequin closed this as completed Jul 14, 2011
    @vstinner
    Copy link
    Member

    I don't really know what is still required

    _cursesmodule.311.get_wch.patch doesn't apply correctly on Python 3.3 and use PyInt_FromLong() function, function removed from Python 3.0. Indeed, Iñigo wrote that the patch was not tested.

    what is the hold up?

    Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

    --

    I commited Iñigo's patch to add window.get_wch() method with minor changes:

    • add :versionadded: 3.3 in the doc
    • document the new method What's new in Python 3.3 document
    • fix an error message: getch => get_wch
    • change error message (if ch==ERROR): "get_wch failed" => "no input" (message copied from the getch function)

    --

    I think that the Unicode support of curses in Python 3 is just completly broken: I opened a new issue for that, issue bpo-12567.

    I also create the issue bpo-12568 to add a function to get the width of a character.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Jul 15, 2011

    Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

    I suspect that a new method is the right way to go, here.

    I see it has been moved to "committed/rejected" status - does that mean that it might still go in, or that it is rejected?

    I think that the Unicode support of curses in Python 3 is just completly broken

    It certainly is less than ideal. ;-)

    @vstinner
    Copy link
    Member

    I see it has been moved to "committed/rejected"
    status - does that mean that it might still go in, or that
    it is rejected?

    I commited the new method, did you see my commit dec10ad41b2a?

    I propose to continue the discussion on issue bpo-12567 (for example, to decide if we need unget_wch or not).

    @pitrou
    Copy link
    Member

    pitrou commented Jul 15, 2011

    /home/antoine/cpython/default/Modules/_cursesmodule.c: In function ‘PyCursesWindow_Get_WCh’:
    /home/antoine/cpython/default/Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’
    /home/antoine/cpython/default/Modules/_cursesmodule.c:926:9: attention : implicit declaration of function ‘mvwget_wch’
    gcc -pthread -shared build/temp.linux-x86_64-3.3-pydebug/home/antoine/cpython/default/Modules/_cursesmodule.o -L/usr/local/lib -lncurses -o build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so
    *** WARNING: renaming "_curses" since importing it failed: build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so: undefined symbol: mvwget_wch
    *** WARNING: importing extension "_curses_panel" failed with <class 'SystemError'>: initialization of _curses_panel raised unreported exception

    Failed to build these modules:
    _curses _curses_panel

    @pitrou pitrou reopened this Jul 15, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Jul 15, 2011

    Also compilation warnings on some buildbots:

    /var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
    /var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:919: warning: implicit declaration of function 'wget_wch'
    /var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:926: warning: implicit declaration of function 'mvwget_wch'

    @vstinner
    Copy link
    Member

    implicit declaration of function ‘wget_wch’

    Oh oh, I expected such error: it means that your ncurses library don't have the wide character API. The compiler command confirm that: "gcc ... -lncurses ...". You use libncurses and not libncursesw.

    Antoine told me that libncursesw is available on its OS, but Python chose libncurses. I suppose that it's because readline is linked to libncurses (and not libncursesw) => see issue bpo-7384.

    Antoine setup is not rare: many Linux distro link readline to libncurses, and so Python cannot use libncursesw.

    For this issue, it's not a problem: we can just add a test to check if get_wch is available or not, and only define the Python function if the C function does exist. But for bpo-12567, it's a bigger problem because it means that we cannot always use the wide character functions if the argument is Unicode (character/string).

    @vstinner
    Copy link
    Member

    ... I suppose that it's because readline is linked to libncurses
    (and not libncursesw) => see issue bpo-7384.

    See also the issue bpo-9408.

    @vstinner
    Copy link
    Member

    implicit declaration of function ‘wget_wch’

    curses_unicode.patch of issue bpo-12567 adds a HAVE_NCURSESW define to only use wide character functions if _curses is linked to libncursesw.

    This define can be used to fix this bug (use wget_ch whereas it is not available).

    @jcea
    Copy link
    Member

    jcea commented Sep 9, 2011

    I have compiled ncurses myself, supporting wide characters. I get this warnings in the buildbots:

    """
    /export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'
    /export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:927: warning: implicit declaration of function 'mvwget_wch'
    /export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:2760: warning: implicit declaration of function 'unget_wch'
    """

    Studying the "ncurses.h", I see the definition of "wget_wch" and others. But these definitions are created only if "_XOPEN_SOURCE_EXTENDED" is defined.

    Something to be explored?.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Mar 7, 2012

    I hope that this is the right bug to file this on (I'm getting lost in all of the curses bugs!).

    I'm testing out the 3.3a1, and I've run into the following issue. On previous releases addch() could accept curses.ACS_HLINE and similar.

    Attempting to use the same code now raises the exception:

    OverflowError: byte doesn't fit in chtype.

    I'm sure this is related to the new code that uses addwstr(), but currently code that used to work will crash.

    I can't work out a fix myself, because I don't fully understand the problem, but I'm happy to provide sample code if it will help.

    Nicholas

    @bitdancer
    Copy link
    Member

    Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.

    Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.

    @NicholasCole
    Copy link
    Mannequin

    NicholasCole mannequin commented Mar 7, 2012

    On Wed, Mar 7, 2012 at 7:40 AM, R. David Murray <report@bugs.python.org> wrote:

    R. David Murray <rdmurray@bitdance.com> added the comment:

    Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.

    Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.

    I've created bpo-14223. I hope I've done so correctly.

    Best wishes,

    Nicholas

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2012

    Antoine's issue has been fixed:
    "Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’"

    It looks like Jesús's issue is specific to Solaris (or is already fixed?), and so I added a comment to the issue bpo-13552: "Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'".

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants