This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Incremental codecs
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: doerwalter Nosy List: doerwalter, hyeshik.chang, lemburg, nnorwitz
Priority: normal Keywords: patch

Created on 2006-02-21 19:32 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
codecs.diff doerwalter, 2006-02-21 19:32
codecs2.diff doerwalter, 2006-02-28 12:08
codecs3.diff doerwalter, 2006-03-01 14:47
codecs4.diff doerwalter, 2006-03-03 17:39
Messages (13)
msg49559 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-02-21 19:32
This patch extends the codec machinery to add
incremental codecs: stateful codecs that don't use a
stream API. It adds the following stuff: a class
codecs.CodecInfo (a subclass of tuple), that is used as
the return value of codecs.lookup();
codecs.IncrementalEncoder and codecs.IncrementalDecoder
(the basic interface classes),
codecs.BufferedIncrementalDecoder (a class that can be
used to implement decoders that must handle incomplete
input); codecs.iterencode() and codecs.iterdecode()
(generators that use the incremental codecs for
encoding/decoding an input iterable). On the C level
PyCodec_IncrementalEncoder() and
PyCodec_IncrementalDecoder() are added.
msg49560 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-02-28 12:08
Logged In: YES 
user_id=89016

This second version of the patch enhances
codecs.iterencode() and codecs.iterdecode(), so that
additional keyword arguments are passed through to the
Incremental(De|En)coder constructor.
msg49561 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-01 14:47
Logged In: YES 
user_id=89016

This third version of the patch fixes the bug when the
iterator in iterencode() or iterdecode() is empty and
updates the docstring in encodings/__init__.py.
msg49562 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-03-02 23:03
Logged In: YES 
user_id=38388

Very nice ! 

This is a much better approach than the feed style path you
wanted to take previously.

Minor nits:

Please separate out the non-related changes to the IDNA
codec into a new patch and assign that to Martin for review.

Is it possible to make IncrementalEncoder/Decoder instances
iterable per-se (without the need to go through the helper
functions iterencode/iterdecode) ?

Thanks.
msg49563 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-03 17:39
Logged In: YES 
user_id=89016

This fourth version of the patch removes the changes to 
Lib/encodings/idna.py (only the addition of the 
IncrementalEncoder/IncrementalDecoder and the changed 
getregentry() remain). This patch to idna.py probably only 
makes sense once this patch is in.

> Is it possible to make IncrementalEncoder/Decoder
> instances iterable per-se (without the need to go
> through the helper functions iterencode/iterdecode) ?

For IncrementalEncoder/Decoder to be iterable it would have 
to have some iterable from which it gets the input. But 
this has the same limitation as the stream API: The user is 
forced to provide the input as a service that the 
encoder/decoder uses, which requires support for a certain 
API. The only change would be that now it's an iterator API 
instead of a stream API.

The incremental codecs invert the call logic: The user no 
longer has to provide a callback service to the codec, but 
calls the codec directly. This gives much more flexibility.
msg49564 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-03-15 08:01
Logged In: YES 
user_id=33168

MAL, do you have any more issues with this patch?  Should it
be assigned to Martin?

MAL, Walter, can you review these patches 1443155 1449471
which I think are related?  Should they go in?

The first alpha is coming up soon and I'd like to get these
patches in ASAP.
msg49565 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-03-15 11:13
Logged In: YES 
user_id=38388

The patch looks OK, accept for some minor glitches such as
this mess :-) ...

+    if not isinstance(entry, codecs.CodecInfo):
+        if not 4 <= len(entry) <= 7:
+             raise CodecRegistryError,\
+                  'module "%s" (%s) failed to register' % \
+                   (mod.__name__, mod.__file__)
+        if not callable(entry[0]) or \
+           not callable(entry[1]) or \
+           (entry[2] is not None and not
callable(entry[2])) or \
+           (entry[3] is not None and not
callable(entry[3])) or \
+           (len(entry) > 4 and entry[4] is not None and not
callable(entry[4])) or \
+           (len(entry) > 5 and entry[5] is not None and not
callable(entry[5])):
             raise CodecRegistryError,\
-                  'incompatible codecs in module "%s" (%s)' % \
-                  (mod.__name__, mod.__file__)
+                'incompatible codecs in module "%s" (%s)' % \
+                (mod.__name__, mod.__file__)
+        if len(entry)<7 or entry[6] is None:
+            entry += (None,)*(6-len(entry)) +
(mod.__name__.split(".", 1)[1],)
+        entry = codecs.CodecInfo(*entry)

Nevertheless, it can be cleaned up after checkin, so please
go ahead with it.

Regarding the idna.py patch, I think you should create a new
patch item for it and assign it to Martin.

Thanks.

Neal, I don't have time to review the two CJK patches.
msg49566 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2006-03-15 11:28
Logged In: YES 
user_id=55188

1449471 isn't related to incremental codecs.  It includes a
simple patch to visual studio project file.

I think Walter is right person to review 1443155 whether it
conforms his interface design. :-) (Thank you in advance!)
msg49567 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-15 11:43
Logged In: YES 
user_id=89016

Checked in as r43045.

Now what do we do with the funny code in
encoding.search_function()? Of course we could always
*require* the search function to return a CodecInfo object.
(but only after the CJK codecs are updated, and even then we
should have some form of backwards compatibility).
msg49568 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-03-15 12:14
Logged In: YES 
user_id=38388

It's only the coding style that looks a bit funny. 

Requiring CodecInfo objects is not a good idea: that way
you'd make it impossible to write codecs that work in both
Python 2.5 and 2.4.
msg49569 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-18 15:26
Logged In: YES 
user_id=89016

MAL, do you have any suggestions on improving the code in
encodings.search_function()?
msg49570 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-18 15:53
Logged In: YES 
user_id=89016

OK, I've submitted a new patch (#1453235) for the idna
simplification.
msg49571 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-04-15 15:12
Logged In: YES 
user_id=89016

Closing the patch.
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 42929
2006-02-21 19:32:00doerwaltercreate