classification
Title: (curses) addstr() takes str in Python 3
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lukasz.langa Nosy List: Trundle, lukasz.langa, petri.lehtinen, vstinner
Priority: high Keywords: patch

Created on 2009-08-20 21:54 by Trundle, last changed 2012-06-04 23:37 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
umlaut2x.py Trundle, 2009-08-20 21:54 Umlauts working in Python 2.x
umlaut3x.py Trundle, 2009-08-20 21:55 Umlauts not working in Python 3.x
curses_charset.patch vstinner, 2009-08-27 23:59
getkey_sample.py Trundle, 2010-11-17 11:27
Messages (11)
msg91786 - (view) Author: Andreas Stührk (Trundle) * Date: 2009-08-20 21:54
In Python 3, curses requires a str for addstr() where I think it should
take bytes instead. Otherwise it is impossible to output anything other
than ASCII (which is even more or less stated on top of curses'
documentation).

See the attached script "umlaut2x.py" for Python 2.6: Outputting
umlauts works fine, both in single-byte and multi-byte environments.

The attached script "umlaut3x.py" is the same script translated to
Python 3. Note that the output here always seems to be utf-8, which is
plain wrong.

A quick test where I changed addstr() to take bytes instead of str
confirmed that outputting other characters than ASCII would work then
in Python 3, too. There are perhaps more places where the types are
wrong. If someone confirms this issue and it is desired, I could
provide a patch.
msg92019 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-08-27 22:31
First, make sure that your Python3 build uses libncursesw and not
libncurses, because libncursesw supports unicode, whereas libncurses
doesn't... On UNIX, use the following command to check this:

ldd $(./python -c "import _curses; print(_curses.__file__)")|grep curses

> Note that the output here always seems to be utf-8, 
> which is plain wrong.

Yes, addstr() always uses utf8 to convert unicode to bytes. It's wrong
if the terminal uses a different charset. But I'm not sure that using
bytes is a better idea: since you would like to print characters,
unicode is the right type.

An idea would be to use a configurable charset. Eg. add a 'charset'
attribute to a window (or to the module).
msg92020 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-08-27 22:34
See also issue #4787
msg92021 - (view) Author: Andreas Stührk (Trundle) * Date: 2009-08-27 22:59
Yes, it uses a version of ncurses which supports wide characters, I
checked that.

I agree that using bytes instead may not be the preferred solution in
Python 3. The point is, currently, it is broken if the user does not
use an utf-8 environment.
msg92023 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-08-27 23:15
I don't really understand because your example, umlaut3x.py, works
correctly on my computer (py3k, ubunty jaunty).

> The point is, currently, it is broken if the user 
> does not use an utf-8 environment.

So the problem is that the charset is hardcoded to utf8. You would like
to be able to change that. Or better, than Python guess your terminal
charset. Right?
msg92024 - (view) Author: Andreas Stührk (Trundle) * Date: 2009-08-27 23:46
Of course it works for you. As you stated in issue #4787, your locale
is 'fr_FR.UTF-8'.

And I don't want Python to guess my terminal's encoding. I want Python
to respect my locale. Which is 'de_DE@euro', and not utf-8.
msg92025 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-08-27 23:59
Here is a first patch to add a method setcharset() to the window class.

Using my patch, you can fix your example by adding the line:

   screen.setcharset(<your charset>)

before addstr().

It's an initial hack to fix the issue. Next steps are:
 - use something better than utf8 as the default charset, maybe
locale.getpreferredencoding()
 - copy the charset on new window creation?
msg121318 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-11-16 21:05
We'll try to solve this for 3.2.
msg121346 - (view) Author: Andreas Stührk (Trundle) * Date: 2010-11-17 11:27
Note that getkey() is broken, too. I attached a simple script to demonstrate that. If you run it and enter some non-ascii input, you can see that getkey() returns an utf-8 encoded str (in my utf-8 environment at least, I haven't check if it's always utf-8 or if it depends on the locale).
msg140380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-14 23:21
I created issue #12567 to fix the Unicode support of the curses module in Python 3.
msg162307 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-06-04 23:37
The issue #12567 fixed this one:

- umlaut3x.py now works in Python 3.3 with an encoding different than UTF-8: Python automatically detects (and uses) the locale encoding
- getkey_sample.py can be patched to handle Unicode correctly using get_wch() instead of getkey()
History
Date User Action Args
2012-06-04 23:37:38vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg162307
2011-10-28 08:17:35petri.lehtinensetnosy: + petri.lehtinen
2011-07-14 23:21:51vstinnersetmessages: + msg140380
2010-11-17 11:27:32Trundlesetfiles: + getkey_sample.py

messages: + msg121346
2010-11-16 21:05:46lukasz.langasetpriority: normal -> high

nosy: + lukasz.langa
versions: + Python 3.2, - Python 3.1
messages: + msg121318

assignee: lukasz.langa
2009-08-27 23:59:58vstinnersetfiles: + curses_charset.patch
keywords: + patch
messages: + msg92025
2009-08-27 23:46:22Trundlesetmessages: + msg92024
2009-08-27 23:15:37vstinnersetmessages: + msg92023
2009-08-27 22:59:26Trundlesetmessages: + msg92021
2009-08-27 22:34:04vstinnersetmessages: + msg92020
2009-08-27 22:31:55vstinnersetnosy: + vstinner
messages: + msg92019
2009-08-20 21:55:45Trundlesetfiles: + umlaut3x.py
2009-08-20 21:54:52Trundlecreate