This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Incremental codecs for CJKCodecs
Type: Stage:
Components: Extension Modules Versions: Python 2.5
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: doerwalter, hyeshik.chang
Priority: normal Keywords: patch

Created on 2006-03-04 18:45 by hyeshik.chang, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
cjkcodecs-inc1.diff hyeshik.chang, 2006-03-04 18:45 incremental codecs patch for cjkcodecs
cjkcodecs-inc2.diff hyeshik.chang, 2006-03-18 18:32 revised patch
Messages (8)
msg49644 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-04 18:45
Here's a supplemental patch for SF #1436130 to
implement CJKCodecs part of the Incremental codec
specification. This patch is implemented in an
interface of Walter's fourth patch on #1436130. Please
test this whether it agrees the design.
msg49645 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-15 12:08
Logged In: YES 

The patch doesn't apply cleanly (conflicts in
Lib/test/ and Tools/unicode/Makefile).
Could you update the patch?

I haven't looked at the C code to closely yet.

Two notes: 1) The tests often call incencoder.encode() or
incdecoder.decode() again after the method has been called
with final=True before. I'm not sure that this should be
allowed. If we allow it, it should be documented in what
state the codec is after calling with final=True (probably
it should be back to the initial state (i.e. like calling
reset())). 2) It seems to me that it isn't possible to
change the error handling during the lifetime of a codec.

Anyway, thanks for the quick patch.
msg49646 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-16 12:32
Logged In: YES 

1) Because CJKCodecs had an internal stateful framework, I
implemented just an interface using it for IncrementalCodec.
It treats final=True as a simple `flush' message(which
doesn't reset or terminate the codec). The behavior is quite
useful for real-time stream processing such as sockets and
tail log watchers. If we disallow that, such usages may
require its own sequence detectors.

For "to reset or not" issue, I think we can simply follow
how iconv does.  iconv doesn't reset the internal state for
iconv(ic, NULL, NULL, ..).

2) Aah.  I didn't notice that .errors is a part of public
API.  The current CJKCodecs can't support it easily yet. 
I'll fix it and upload a updated patch soon.  Thank you for
your review!
msg49647 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-18 15:24
Logged In: YES 

What other interpretation of the final parameter can we use
that doesn't make it completely useless? What about the
following: "If final is true the codec must encode/decode
the input completely and must flush all buffers. If this
isn't possible (e.g. because of incomplete byte sequences on
decoding) it must raise an exception (unless prevented by an
error handler)"?
msg49648 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-18 18:32
Logged In: YES 

I updated the patch for the .errors visibility.

I like the statement.  But the current implementation of the
patch is slightly different for a corner case; the current
implementation keeps buffers if an error is occurred.  That
is somewhat natural because error cases doesn't make
side-effects usually.  But in other side, I agree to your
statement in a view of a direct interpretation of
final=True. What do you think about this?
msg49649 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-18 20:47
Logged In: YES 

Maybe the statement is a bit misleading in its current form.
I didn't mean that error handling prevents the flushing of
the buffers, just that error handling prevents raising an
error. I hope the following is clearer: "If final is true
the codec must encode/decode the input completely and must
flush all buffers. If this isn't possible (e.g. because of
incomplete byte sequences still remaining in the buffer on
decoding) it must initiate error handling just like in the
stateless case (which might raise an exception)."

I'll take a look at the patch in the next few days.
msg49650 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-25 19:56
Logged In: YES 

OK, a few notes on the patch:

In the test "c = codecs.lookup('utf-8')[3](s)" should be
written as "c = codecs.getwriter('utf-8')(s)". Someday in
the future CodecInfo may no longer be a tuple.

Instead of htmlentitydefs.entitydefs, the test could use

I'd like to see a few tests for error callbacks that return
the wrong objects and for callbacks that return an offset
that is not exc.end.

The ERROR_ISCUSTOM() macro looks wrong to me: smaller than 1
*and* greater that 3?

In mbiencoder_encode()

r = encoder_encode_stateful(STATEFUL_ECTX(self), data, final);
if (r == NULL)
    return NULL;
return r;

can be simplified to:

return encoder_encode_stateful(STATEFUL_ECTX(self), data,

Apart from that the patch looks good to me, so go ahead and
check it in.
msg49651 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-26 02:35
Logged In: YES 

Committed as r43320. Thank you for the kind reviews! :)
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42979
2006-03-04 18:45:01hyeshik.changcreate