This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients
Date 2001-06-12.18:59:24
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=89016

How the callbacks work:

A PyObject * named errors is passed in. This may by NULL,
Py_None, 'strict', u'strict', 'ignore', u'ignore',
'replace', u'replace' or a callable object.
PyCodec_EncodeHandlerForObject maps all of these objects to
one of the three builtin error callbacks
PyCodec_RaiseEncodeErrors (raises an exception),
PyCodec_IgnoreEncodeErrors (returns an empty replacement
string, in effect ignoring the error),
PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode
replacement character to signify to the encoder that it
should choose a suitable replacement character) or directly
returns errors if it is a callable object. When an
unencodable character is encounterd the error handling
callback will be called with the encoding name, the original
unicode object and the error position and must return a
unicode object that will be encoded instead of the offending
character (or the callback may of course raise an
exception). U+FFFD characters in the replacement string will 
be replaced with a character that the encoder chooses ('?'
in all cases).

The implementation of the loop through the string is done in
the following way. A stack with two strings is kept and the
loop always encodes a character from the string at the
stacktop. If an error is encountered and the stack has only
one entry (during encoding of the original string) the
callback is called and the unicode object returned is pushed
on the stack, so the encoding continues with the replacement
string. If the stack has two entries when an error is
encountered, the replacement string itself has an
unencodable character and a normal exception raised. When
the encoder has reached the end of it's current string there
are two possibilities: when the stack contains two entries,
this was the replacement string, so the replacement string
will be poppep from the stack and encoding continues with
the next character from the original string. If the stack
had only one entry, encoding is finished.

(I hope that's enough explanation of the API and implementation)

I have renamed the static ...121 function to all lowercase
names.

BTW, I guess PyUnicode_EncodeUnicodeEscape could be
reimplemented as PyUnicode_EncodeASCII with a \uxxxx
replacement callback.

PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors,
PyCodec_ReplaceEncodeErrors are globally visible because
they have to be available in _codecsmodule.c to wrap them as
Python function objects, but they can't be implemented in
_codecsmodule, because they need to be available to the
encoders in unicodeobject.c (through
PyCodec_EncodeHandlerForObject), but importing the codecs
module might result in an endless recursion, because
importing a module requires unpickling of the bytecode,
which might require decoding utf8, which ... (but this will
only happen, if we implement the same mechanism for the
decoding API)

I have not touched PyUnicode_TranslateCharmap yet, 
should this function also support error callbacks? Why would
one want the insert None into the mapping to call the callback?

A remaining problem is how to implement decoding error
callbacks. In Python 2.1 encoding and decoding errors are
handled in the same way with a string value. But with
callbacks it doesn't make sense to use the same callback for
encoding and decoding (like codecs.StreamReaderWriter and
codecs.StreamRecoder do). Decoding callbacks have a
different API. Which arguments should be passed to the
decoding callback, and what is the decoding callback
supposed to do?
History
Date User Action Args
2007-08-23 15:06:03adminlinkissue432401 messages
2007-08-23 15:06:03admincreate