Message 140375 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Nicholas.Cole, akuchling, cben, gpolo, inigoserna, python-dev, r.david.murray, schodet, vstinner, zeha
Date	2011-07-14.22:33:36
SpamBayes Score	1.6063484e-10
Marked as misclassified	No
Message-id	<1310682817.11.0.980195084883.issue12567@psf.upfronthosting.co.za>
In-reply-to

Content
curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue #6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice. Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python: for byte in 'é'.encode('utf-8'): win.addch(byte) I see two possible solutions: A) Add a new functions only accepting characters, and not accept characters in the existing functions B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233). I prefer solution (B) because addch('é') would just work as expected.

curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue #6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice.

Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python:

    for byte in 'é'.encode('utf-8'):
        win.addch(byte)

I see two possible solutions:

A) Add a new functions only accepting characters, and not accept characters in the existing functions

B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233).

I prefer solution (B) because addch('é') would just work as expected.

History
Date	User	Action	Args
2011-07-14 22:33:37	vstinner	set	recipients: + vstinner, akuchling, cben, gpolo, r.david.murray, inigoserna, zeha, schodet, python-dev, Nicholas.Cole
2011-07-14 22:33:37	vstinner	set	messageid: <1310682817.11.0.980195084883.issue12567@psf.upfronthosting.co.za>
2011-07-14 22:33:36	vstinner	link	issue12567 messages
2011-07-14 22:33:36	vstinner	create