classification
Title: Patch: new method get_wch for ncurses bindings: accept wide characters (unicode)
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Nicholas.Cole, akuchling, cben, gpolo, haypo, inigoserna, jcea, phep, pitrou, python-dev, r.david.murray, schodet, zeha
Priority: normal Keywords: patch

Created on 2009-08-21 11:43 by inigoserna, last changed 2012-03-08 01:18 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
curses.get_wch.patch inigoserna, 2009-08-21 11:44 Patch for the documentation
test_get_wch.py inigoserna, 2009-08-21 11:46 Test example
_cursesmodule.get_wch.patch inigoserna, 2009-08-21 11:53 Patch against Python 2.6.2 _cursesmodule.c
_cursesmodule.311.get_wch.patch inigoserna, 2009-08-21 12:14 Patch against Python 3.1.1 _cursesmodule.c
test_ucs2w.py inigoserna, 2009-08-26 16:14 Several implementations of wcwidth() and wcswidth()
ucs2w.c inigoserna, 2009-08-26 16:14 C extension implementation of wcwidth() and wcswidth()
Messages (30)
msg91816 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-21 11:43
Currently,there is no a simple way in curses bindings to get the code
associated with a key press of non ascii keystroke (f.e. ç) in terminals
configured with UTF-8 encoding. 

getch returns the code for a wide character byte a byte.
But ncurses library has a proper function to do it: get_wch.

Patch against Python v2.6.2 to provide this missing get_wch method
in the ncurses bindings.

Include a test example and a patch to the documentation as well.

More info and a partial solution without patching python curses module
on this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/67dce30f0a2742a6?fwc=2
msg91817 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-21 11:44
Added patch for the documentation
msg91818 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-21 11:46
Added test example
msg91819 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-21 11:53
Added missing file: patch against Python v2.6.2
msg91821 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-21 12:14
Added patch against Python v3.1.1. 
NOT TESTED!
msg91867 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-08-22 18:13
Have you looked into issue700921 already ? It seems a lot of discussion
was generated there, but no patches.
msg91868 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-22 18:23
Thanks for the pointer, haven't seen anything when I searched for get_wch.

The patch provided here only adds this get_wch function, because as A.M.
Kuchling explained in issue700921, it's possible to use wide chars now,
the only feature missing is get_wch.
msg91956 - (view) Author: Cherniavsky Beni (cben) * Date: 2009-08-25 19:26
Nice.  2 questions:

1. Why not change getch() to always use get_wch()?
2. I think you also want fix getkey() / introduce get_wkey().
msg91973 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-26 15:47
Q. Why not change getch() to always use get_wch()?

This could break backwards compatibility.
There are some code out there that may use getch() to get the bytes
stream one by one and build the wide char.
In fact I'm using this trick to get unicode chars by now.
Look the thread link in first comment to find the implementation I've
developed for my app. Other people are using similar approaches too.


Q. I think you also want fix getkey() / introduce get_wkey().

In my own experience get_wkey isn't be as useful when dealing with wide
chars. But, of course, that's only my use cases.
msg91974 - (view) Author: Iñigo Serna (inigoserna) Date: 2009-08-26 16:13
Btw, I don't know if this is the best place to comment it but as it is
somehow related with ncurses...

Other functions I miss a lot are wcwidth() and wcswidth(). 

These functions return the real width (read, cells length in screen) for
unicode strings. 

An example to clarify the issue: one simple Chinese character could need
2 cells on screen, thus len(chinese_unicode_string) won't return the
real screen width needed to show the string.

i.e., len(chinese_unicode_string) != wcswidth(chinese_unicode_string)


Those functions are included into not so old glibc versions (2.2+?), at
least on my Linux systems.

Sadly enough, python doesn't bind them, afaik.
I've tried ctypes but don't work for me (don't know the reason), so I've
written some replacements.

Please look at these files: 

* test_ucs2w.py: benchmarks to different implementations. Most of them,
pure python. Please consider only ucs2w_1x, other are only experiments.

* ucs2w.c: C extension implementation


I think Python could benefit from having these functions in the standard
library. Surely, most simple way should be to bind glibc functions, but
don't know if they exist on other platforms such as MacOS X or Windows.

Neither do I know where they fit... perhaps in unicodedata module.


What do you think? who is the person to convince? (please, don't ask me
to write a PEP, my English is not good enough).
msg91975 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-26 17:03
For the title concern of this patch I'm adding akuchling as nosy.

Judging by your post your English probably is good enough to write a PEP
(the PEP editors should help with fine tuning it, at least in theory).
However, I doubt a PEP would be necessary.

As for where to raise the (new) issue...given that these functions are
independent of curses, I'd say open a new issue.  If you do that, I'd
suggest making lemburg nosy on the issue, as he is the original author
of the unicodedata module and many other unicode things in python.

I'm setting the stage to test needed since your test case isn't a unit
test.  I have no idea (never having worked with curses) how hard a unit
test would be to write.  Nor have I reviewed the patch, for the same reason.
msg91982 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2009-08-26 22:14
inigoserna>> Other functions I miss a lot are wcwidth() and wcswidth()

I wrote a patch to implement unicode.width() method:
http://bugs.python.org/file13357/unicode_width.patch

It's part of the issue #2382 (SyntaxError cursor shifted if multibyte
character is in line)
msg108249 - (view) Author: Christian Hofstaedtler (zeha) Date: 2010-06-20 21:22
Will this be part of 3.2 and possibly 2.7?

Without these patches wide character input using curses is basically impossible (on at least some platforms).
msg139333 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011-06-27 22:52
Is there any hope that something like this patch will make it into a future version?  As far as I can see, entering accented characters is currently impossible on the latest release versions of python...or am I missing something?
msg139335 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-27 23:03
Can someone update the patch for Python 3.3? Python 2.7 and 3.2 don't accept new features.
msg140371 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011-07-14 20:10
The bug is marked "Test Needed".

I am very keen to see this issue fixed, and would be very willing to help, but I don't really know what is still required. As far as I can see there is a patch waiting - what is the hold up?
msg140374 - (view) Author: Roundup Robot (python-dev) Date: 2011-07-14 21:08
New changeset dec10ad41b2a by Victor Stinner in branch 'default':
Close #6755: Add get_wch() method to curses.window class
http://hg.python.org/cpython/rev/dec10ad41b2a
msg140378 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-14 22:45
> I don't really know what is still required

_cursesmodule.311.get_wch.patch doesn't apply correctly on Python 3.3 and use PyInt_FromLong() function, function removed from Python 3.0. Indeed, Iñigo wrote that the patch was not tested.

> what is the hold up?

Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

--

I commited Iñigo's patch to add window.get_wch() method with minor changes:

 - add :versionadded: 3.3 in the doc
 - document the new method What's new in Python 3.3 document
 - fix an error message: getch => get_wch
 - change error message (if ch==ERROR): "get_wch failed" => "no input" (message copied from the getch function)

--

I think that the Unicode support of curses in Python 3 is just completly broken: I opened a new issue for that, issue #12567.

I also create the issue #12568 to add a function to get the width of a character.
msg140403 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011-07-15 13:21
> Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

I suspect that a new method is the right way to go, here.  

I see it has been moved to "committed/rejected" status - does that mean that it might still go in, or that it is rejected?

> I think that the Unicode support of curses in Python 3 is just completly broken

It certainly is less than ideal. ;-)
msg140404 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-15 13:28
> I see it has been moved to "committed/rejected"
> status - does that mean that it might still go in, or that
> it is rejected?

I commited the new method, did you see my commit dec10ad41b2a?

I propose to continue the discussion on issue #12567 (for example, to decide if we need unget_wch or not).
msg140408 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-15 13:58
/home/antoine/cpython/default/Modules/_cursesmodule.c: In function ‘PyCursesWindow_Get_WCh’:
/home/antoine/cpython/default/Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’
/home/antoine/cpython/default/Modules/_cursesmodule.c:926:9: attention : implicit declaration of function ‘mvwget_wch’
gcc -pthread -shared build/temp.linux-x86_64-3.3-pydebug/home/antoine/cpython/default/Modules/_cursesmodule.o -L/usr/local/lib -lncurses -o build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so
*** WARNING: renaming "_curses" since importing it failed: build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so: undefined symbol: mvwget_wch
*** WARNING: importing extension "_curses_panel" failed with <class 'SystemError'>: initialization of _curses_panel raised unreported exception

Failed to build these modules:
_curses            _curses_panel
msg140409 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-15 14:15
Also compilation warnings on some buildbots:

/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:919: warning: implicit declaration of function 'wget_wch'
/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:926: warning: implicit declaration of function 'mvwget_wch'
msg140410 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-15 14:36
> implicit declaration of function ‘wget_wch’

Oh oh, I expected such error: it means that your ncurses library don't have the wide character API. The compiler command confirm that: "gcc ... -lncurses ...". You use libncurses and not libncursesw.

Antoine told me that libncursesw is available on its OS, but Python chose libncurses. I suppose that it's because readline is linked to libncurses (and not libncursesw) => see issue #7384.

Antoine setup is not rare: many Linux distro link readline to libncurses, and so Python cannot use libncursesw.

For this issue, it's not a problem: we can just add a test to check if get_wch is available or not, and only define the Python function if the C function does exist. But for #12567, it's a bigger problem because it means that we cannot always use the wide character functions if the argument is Unicode (character/string).
msg140412 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-15 14:43
> ... I suppose that it's because readline is linked to libncurses
> (and not libncursesw) => see issue #7384.

See also the issue #9408.
msg140640 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-19 00:30
> implicit declaration of function ‘wget_wch’

curses_unicode.patch of issue #12567 adds a HAVE_NCURSESW define to only use wide character functions if _curses is linked to libncursesw.

This define can be used to fix this bug (use wget_ch whereas it is not available).
msg143798 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-09-09 19:55
I have compiled ncurses myself, supporting wide characters. I get this warnings in the buildbots:

"""
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:927: warning: implicit declaration of function 'mvwget_wch'
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:2760: warning: implicit declaration of function 'unget_wch'
"""

Studying the "ncurses.h", I see the definition of "wget_wch" and others. But these definitions are created only if "_XOPEN_SOURCE_EXTENDED" is defined.

Something to be explored?.
msg155052 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012-03-07 07:22
I hope that this is the right bug to file this on (I'm getting lost in all of the curses bugs!).

I'm testing out the 3.3a1, and I've run into the following issue.  On previous releases addch() could accept curses.ACS_HLINE and similar.

Attempting to use the same code now raises the exception:

OverflowError: byte doesn't fit in chtype.

I'm sure this is related to the new code that uses addwstr(), but currently code that used to work will crash.

I can't work out a fix myself, because I don't fully understand the problem, but I'm happy to provide sample code if it will help.

Nicholas
msg155054 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-03-07 07:40
Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.

Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.
msg155119 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012-03-07 20:47
On Wed, Mar 7, 2012 at 7:40 AM, R. David Murray <report@bugs.python.org> wrote:
>
> R. David Murray <rdmurray@bitdance.com> added the comment:
>
> Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.
>
> Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.

I've created issue 14223.  I hope I've done so correctly.

Best wishes,

Nicholas
msg155146 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-03-08 01:18
Antoine's issue has been fixed:
"Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’"

It looks like Jesús's issue is specific to Solaris (or is already fixed?), and so I added a comment to the issue #13552: "Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'".
History
Date User Action Args
2012-03-08 01:18:55hayposetstatus: open -> closed
2012-03-08 01:18:42hayposetmessages: + msg155146
2012-03-07 20:47:08Nicholas.Colesetmessages: + msg155119
2012-03-07 07:40:27r.david.murraysetmessages: + msg155054
2012-03-07 07:22:20Nicholas.Colesetmessages: + msg155052
2011-09-09 19:55:15jceasetnosy: + jcea
messages: + msg143798
2011-07-29 18:43:16phepsetnosy: + phep
2011-07-19 00:30:10hayposetmessages: + msg140640
2011-07-15 14:43:40hayposetmessages: + msg140412
2011-07-15 14:36:35hayposetmessages: + msg140410
2011-07-15 14:15:01pitrousetmessages: + msg140409
2011-07-15 13:58:29pitrousetstatus: closed -> open
nosy: + pitrou
messages: + msg140408

2011-07-15 13:28:50hayposetmessages: + msg140404
2011-07-15 13:21:57Nicholas.Colesetmessages: + msg140403
2011-07-14 22:45:56hayposetmessages: + msg140378
2011-07-14 21:08:42python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg140374

resolution: fixed
stage: test needed -> resolved
2011-07-14 20:10:31Nicholas.Colesetmessages: + msg140371
2011-06-27 23:04:02hayposetmessages: - msg139334
2011-06-27 23:03:55hayposetmessages: + msg139335
2011-06-27 23:02:21hayposetmessages: + msg139334
versions: + Python 3.3, - Python 2.7, Python 3.2
2011-06-27 22:52:10Nicholas.Colesetnosy: + Nicholas.Cole
messages: + msg139333
2010-10-13 12:30:05schodetsetnosy: + schodet
2010-06-20 21:22:35zehasetnosy: + zeha
messages: + msg108249
2009-08-26 22:14:23hayposetnosy: + haypo
messages: + msg91982
2009-08-26 17:03:43r.david.murraysetpriority: normal

nosy: + akuchling, r.david.murray
messages: + msg91975

stage: test needed
2009-08-26 16:14:45inigosernasetfiles: + ucs2w.c
2009-08-26 16:14:17inigosernasetfiles: + test_ucs2w.py
2009-08-26 16:13:21inigosernasetmessages: + msg91974
2009-08-26 15:47:54inigosernasetmessages: + msg91973
2009-08-25 19:26:31cbensetnosy: + cben
messages: + msg91956
2009-08-22 18:23:23inigosernasetmessages: + msg91868
2009-08-22 18:13:31gpolosetmessages: + msg91867
2009-08-22 17:49:31gpolosetnosy: + gpolo

versions: - Python 2.6, Python 3.1
2009-08-21 12:14:02inigosernasetfiles: + _cursesmodule.311.get_wch.patch

messages: + msg91821
versions: + Python 3.1, Python 3.2
2009-08-21 11:53:26inigosernasetfiles: + _cursesmodule.get_wch.patch

messages: + msg91819
2009-08-21 11:46:19inigosernasetfiles: + test_get_wch.py

messages: + msg91818
2009-08-21 11:44:33inigosernasetfiles: + curses.get_wch.patch
keywords: + patch
messages: + msg91817
2009-08-21 11:43:30inigosernacreate