Message 61105 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	cben
Recipients
Date	2003-03-10.20:55:33
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=36166 Good question :-). Here are the basic additions of the wide curses interface: * `chtype` (which must be an integral type) donesn't have enough place to hold a character OR-ed with the attributes, nor would that be useful enough since combining characters must be handled. Therefore two types are intoroduced: attr_t - an integral type used to hold an OR-ed set of attributes that begin with the prefix ``WA_``. These attributes are semantically a superset of the ``A_`` ones and can have different values (although in ncurses they are the same). cchar_t - a type representing one character cell: at most one spacing character, an implementation-defined number of combining characters, attributes and a color pair. * A whole lot of new functions are provided using these new types. The distinguishing naming style is the separation of words with underscope. Functions that work on single chars have counterparts (``_wch``) that recieve/return cchar_t (except for get_wch which is a bogus mutation). Functions that work on strings have counterparts (``_wstr``) that recieve/return (wchar_t ); many also are duplicated with a (cchar_t ) interface (``_wchstr``). ** All old functions having to do with characters are semantically just degenerate compatibility interfaces to the new ones. * Semantics are defined for adding combining characters: if only non-spacing characters are given, they are added to the existing complex character; if a spacing character is present, the whole cell is replaced. * Semantics are defined for double-width characters (what happens when you break them in various ways). The simplest thing is just to wrap all the extra functions, exposing two APIs in Python, with the later only availible when the platform supports it. This would be painful to work with and I'd rather avoid it. A better approach is just to overload the old names to work with unicode strings. For single-character methods (e.g. `addch`), it's harder. The (character ordinal \| attributes) interface for should be deprecated and only work for ascii chars, in a backwards-compatible way. The interface where the character and attributes are given as separate arguments can be cleanly extended to accept unicode characters/ordinals. The behaivour w.r.t. combing and double-width characters should be defined. Complex chars should be repsented as multi-char unicode strings (therefore unicode ordinals are a limited representation). I don't think anything special is needed for sensible double-width handling? The (char_t *) interfaces (``_wchstr``) are convenient for storing many characters with inividual attributes; I'm not sure how to expose them (list of char, attr tuples?). There is the question of what to do in the absense of wide curses in the platform, when the unicode interface will be called. I think that some settable "curses default encoding" should be used as a fallback, so that people can keep their sanity. This should be specific to curses, or maybe even settable per-window, so that some basic input/output methods can implemented as a codec (this is suboptimal but I think could be useful as a quick solution). I can write an initial patch but don't expect it quickly. This could use the counsel of somebody with wide-curses expereince (I'm a newbe to this, I want to start experimenting in Python rather than C :-).

Logged In: YES 
user_id=36166

Good question :-).
Here are the basic additions of the wide curses interface:

* `chtype` (which must be an integral type) donesn't have
enough place to hold a character OR-ed with the attributes,
nor would that be useful enough since combining characters
must be handled.  Therefore two types are intoroduced:

** attr_t - an integral type used to hold an OR-ed set of
attributes that begin with the prefix ``WA_``.  These
attributes are semantically a superset of the ``A_`` ones
and can have different values (although in ncurses they are
the same).

** cchar_t - a type representing one character cell: at most
one spacing character, an implementation-defined number of
combining characters, attributes and a color pair.

* A whole lot of new functions are provided using these new
types.  The distinguishing naming style is the separation of
words with underscope.

** Functions that work on single chars have counterparts
(``_wch``) that recieve/return cchar_t (except for get_wch
which is a bogus mutation).

** Functions that work on strings have counterparts
(``_wstr``) that recieve/return (wchar_t *); many also are
duplicated with a (cchar_t *) interface (``_wchstr``).

** All old functions having to do with characters are
semantically just degenerate compatibility interfaces to the
new ones.

* Semantics are defined for adding combining characters: if
only non-spacing characters are given, they are added to the
existing complex character; if a spacing character is
present, the whole cell is replaced.

* Semantics are defined for double-width characters  (what
happens when you break them in various ways).

The simplest thing is just to wrap all the extra functions,
exposing two APIs in Python, with the later only availible
when the platform supports it.  This would be painful to
work with and I'd rather avoid it.

A better approach is just to overload the old names to work
with unicode strings.  For single-character methods (e.g.
`addch`), it's harder.  The (character ordinal | attributes)
interface for should be deprecated and only work for ascii
chars, in a backwards-compatible way.  The interface where
the character and attributes are given as separate arguments
can be cleanly extended to accept unicode characters/ordinals.

The behaivour w.r.t. combing and double-width characters
should be defined.  Complex chars should be repsented as
multi-char unicode strings (therefore unicode ordinals are a
limited representation).  I don't think anything special is
needed for sensible double-width handling?

The (char_t *) interfaces (``_wchstr``) are convenient for
storing many characters with inividual attributes; I'm not
sure how to expose them (list of char, attr tuples?).

There is the question of what to do in the absense of wide
curses in the platform, when the unicode interface will be
called.  I think that some settable "curses default
encoding" should be used as a fallback, so that people can
keep their sanity.  This should be specific to curses, or
maybe even settable per-window, so that some basic
input/output methods can implemented as a codec (this is
suboptimal but I think could be useful as a quick solution).

I can write an initial patch but don't expect it quickly. 
This could use the counsel of somebody with wide-curses
expereince (I'm a newbe to this, I want to start
experimenting in Python rather than C :-).

History
Date	User	Action	Args
2008-01-20 09:59:23	admin	link	issue700921 messages
2008-01-20 09:59:23	admin	create