classification
Title: Fix codecs.iterencode/decode() by allowing data parameter to be omitted
Type: behavior Stage: resolved
Components: Documentation, Unicode Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: doerwalter, ezio.melotti, lemburg, martin.panter, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-01-13 12:48 by martin.panter, last changed 2016-10-15 01:37 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
final-no-object.patch martin.panter, 2015-01-13 12:48 review
final-no-object.ignore-space.diff martin.panter, 2015-01-13 12:50 diff --ignore-all-space review
iter-unsupported.patch martin.panter, 2016-08-20 09:22 review
Messages (8)
msg233932 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-13 12:48
As mentioned in Issue 20132, iterencode() and iterdecode() only work on text-to-byte codecs, because they assume particular data types when finalizing the incremental codecs. This patch changes the signature of the IncrementalEncoder and IncrementalDecoder methods from

IncrementalEncoder.encode(object[, final])
IncrementalEncoder.decode(object[, final])

to

IncrementalEncoder.encode([object,] [final])
IncrementalEncoder.decode([object,] [final])

so that iteren/decode(), and perhaps in the future, StreamWriter/Reader, can operate the incremental codec without knowing what kind of data should be processed.
msg233933 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-13 12:50
Original patch has lots of whitespace changes, probably due to generated codec code not being regenerated for a long time. This diff ignores the space changes, so should be easier to review.
msg234206 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-18 00:19
Another idea that doesn’t involve changing the incremental codec APIs is kind of described in <https://bugs.python.org/issue7475#msg145986>: to add format parameters to iterencode() and iterdecode(), which would allow it to determine the right data type to finalize the codecs with.
msg256746 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-19 23:50
The patch changes public interface. This breaks compatibility with third-party codecs implementing it.

We have found other solution to iterencode/iterdecode problem. For example we can buffer iterated values and encode with one step delay:

    prev = sentinel = object()
    for input in iterator:
        if prev is not sentinel:
            output = encoder.encode(prev)
            if output:
                yield output
        prev = input
    if prev is not sentinel:
        output = encoder.encode(prev, True)
        if output:
            yield output

Or remember the previous value and use it to calculate the empty value at the end (works only if input type supports slicing):

    prev = sentinel = object()
    for input in iterator:
        output = encoder.encode(input)
        if output:
            yield output
        prev = input
    if prev is not sentinel:
        output = encoder.encode(prev[:0], True)
        if output:
            yield output
msg273101 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-19 09:06
Serhiy’s two proposals won’t work for codecs that include non-zero output for zero input:

>>> tuple(iterencode((), "utf-8-sig"))
(b'\xef\xbb\xbf',)
>>> encode(b"", "uu")
b'begin 666 <data>\n \nend\n'
>>> encode(b"", "zlib")
b'x\x9c\x03\x00\x00\x00\x00\x01'

However I agree that changing the incremental codec APIs is not ideal. Since nobody seems to care that much, it might be simpler to document that:

* iterencode() only works where text str objects can be encoded, so base64-codec is not supported, but rot13-codec is supported
* iterdecode() only works where bytes objects can be decoded, so rot13-codec is not supported, but base64-codec should be supported (pending other aspects of Issue 20132)
msg273198 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-20 09:22
Here is my documentation proposal.
msg273203 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-20 10:25
> it might be simpler to document that

Agreed.
msg278678 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-15 01:05
New changeset 402eba63650c by Martin Panter in branch '3.5':
Issue #23231: Document codecs.iterencode(), iterdecode() shortcomings
https://hg.python.org/cpython/rev/402eba63650c

New changeset 0837940bcb9f by Martin Panter in branch '3.6':
Issue #23231: Merge codecs doc from 3.5 into 3.6
https://hg.python.org/cpython/rev/0837940bcb9f

New changeset 1955dcc27332 by Martin Panter in branch 'default':
Issue #23231: Merge codecs doc from 3.6
https://hg.python.org/cpython/rev/1955dcc27332
History
Date User Action Args
2016-10-15 01:37:36martin.pantersetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.7
2016-10-15 01:05:11python-devsetnosy: + python-dev
messages: + msg278678
2016-08-20 10:25:41serhiy.storchakasetassignee: serhiy.storchaka -> martin.panter

messages: + msg273203
nosy: + r.david.murray
2016-08-20 09:22:54martin.pantersetfiles: + iter-unsupported.patch
versions: + Python 3.5
messages: + msg273198

components: + Documentation, - Library (Lib)
stage: patch review
2016-08-19 09:06:30martin.pantersetmessages: + msg273101
2015-12-20 05:30:29r.david.murraysetnosy: - Ruel Net1400
2015-12-20 05:30:11r.david.murraysetmessages: - msg256747
2015-12-20 01:03:26Ruel Net1400setnosy: + Ruel Net1400
messages: + msg256747
2015-12-19 23:50:07serhiy.storchakasetnosy: + lemburg, doerwalter

messages: + msg256746
versions: + Python 3.6, - Python 3.5
2015-07-23 01:54:38martin.panterlinkissue20132 dependencies
2015-07-16 02:08:14martin.panterlinkissue13881 dependencies
2015-02-28 13:37:26serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
2015-01-18 00:19:04martin.pantersetmessages: + msg234206
2015-01-13 12:50:06martin.pantersetfiles: + final-no-object.ignore-space.diff

messages: + msg233933
2015-01-13 12:48:19martin.pantercreate