This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients
Date 2002-04-20.15:34:56
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=89016

A new idea for the interface between the
codec and the callback:

Maybe we could have new exception classes
UnicodeEncodeError, UnicodeDecodeError
and UnicodeTranslateError derived from
UnicodeError. They have all the attributes
that are passed as an argument
tuple in the current version:
string: the original string
start: the start position of the
unencodable characters/undecodable bytes
end: the end position+1 of the unencodable
characters/undecodable bytes.
reason: the a string, that explains, why
the encoding/decoding doesn't work.

There is no data object, because when a codec
wants to pass extended information to the
callback it can do this via a derived
class.

It might be better to move these attributes
to the base class UnicodeError, but this
might have backwards compatibility
problems.

With this method we really can have one global
registry for all callbacks, because for callback
names that must work with encoding *and* decoding
*and* translating (i.e. "strict", "replace" and 
"ignore"), the callback can check which type 
of exception was passed, so "replace" can
e.g. look like this:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      return ("?", exc.end)
   else:
      return (u"?"*(exc.end-exc.start), exc.end)

Another possibility would be to do the commucation
callback->codec by assigning to attributes
of the exception object. The resyncronisation 
position could even be preassigned to end, so
the callback only needs to specify the 
replacement in most cases:

def replace(exc):
   if isinstance(exc, UnicodeDecodeError):
      exc.replacement = "?"
   else:
      exc.replacement = u"?"*(exc.end-exc.start)

As many of the assignments can now be done on
the C level without having to allocate Python
objects (except for the replacement string
and the reason), this version might even be 
faster, especially if we allow the codec to 
reuse the exception object for the next call 
to the callback.

Does this make sense, or is this to fancy?
History
Date User Action Args
2007-08-23 15:06:08adminlinkissue432401 messages
2007-08-23 15:06:08admincreate