Fix codecs.iterencode/decode() by allowing data parameter to be omitted #67420

vadmium · 2015-01-13T12:48:19Z

BPO	23231
Nosy	@malemburg, @doerwalter, @vstinner, @ezio-melotti, @bitdancer, @vadmium, @serhiy-storchaka
Files	final-no-object.patch final-no-object.ignore-space.diff: diff --ignore-all-space iter-unsupported.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/vadmium'
closed_at = <Date 2016-10-15.01:37:36.979>
created_at = <Date 2015-01-13.12:48:19.265>
labels = ['type-bug', '3.7', 'expert-unicode', 'docs']
title = 'Fix codecs.iterencode/decode() by allowing data parameter to be omitted'
updated_at = <Date 2016-10-15.01:37:36.977>
user = 'https://github.com/vadmium'

bugs.python.org fields:

activity = <Date 2016-10-15.01:37:36.977>
actor = 'martin.panter'
assignee = 'martin.panter'
closed = True
closed_date = <Date 2016-10-15.01:37:36.979>
closer = 'martin.panter'
components = ['Documentation', 'Unicode']
creation = <Date 2015-01-13.12:48:19.265>
creator = 'martin.panter'
dependencies = []
files = ['37691', '37692', '44164']
hgrepos = []
issue_num = 23231
keywords = ['patch']
message_count = 8.0
messages = ['233932', '233933', '234206', '256746', '273101', '273198', '273203', '278678']
nosy_count = 8.0
nosy_names = ['lemburg', 'doerwalter', 'vstinner', 'ezio.melotti', 'r.david.murray', 'python-dev', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23231'
versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

vadmium · 2015-01-13T12:48:02Z

As mentioned in bpo-20132, iterencode() and iterdecode() only work on text-to-byte codecs, because they assume particular data types when finalizing the incremental codecs. This patch changes the signature of the IncrementalEncoder and IncrementalDecoder methods from

IncrementalEncoder.encode(object[, final])
IncrementalEncoder.decode(object[, final])

to

IncrementalEncoder.encode([object,] [final])
IncrementalEncoder.decode([object,] [final])

so that iteren/decode(), and perhaps in the future, StreamWriter/Reader, can operate the incremental codec without knowing what kind of data should be processed.

vadmium · 2015-01-13T12:50:06Z

Original patch has lots of whitespace changes, probably due to generated codec code not being regenerated for a long time. This diff ignores the space changes, so should be easier to review.

vadmium · 2015-01-18T00:19:05Z

Another idea that doesn’t involve changing the incremental codec APIs is kind of described in <https://bugs.python.org/issue7475#msg145986\>: to add format parameters to iterencode() and iterdecode(), which would allow it to determine the right data type to finalize the codecs with.

serhiy-storchaka · 2015-12-19T23:50:07Z

The patch changes public interface. This breaks compatibility with third-party codecs implementing it.

We have found other solution to iterencode/iterdecode problem. For example we can buffer iterated values and encode with one step delay:

    prev = sentinel = object()
    for input in iterator:
        if prev is not sentinel:
            output = encoder.encode(prev)
            if output:
                yield output
        prev = input
    if prev is not sentinel:
        output = encoder.encode(prev, True)
        if output:
            yield output

Or remember the previous value and use it to calculate the empty value at the end (works only if input type supports slicing):

    prev = sentinel = object()
    for input in iterator:
        output = encoder.encode(input)
        if output:
            yield output
        prev = input
    if prev is not sentinel:
        output = encoder.encode(prev[:0], True)
        if output:
            yield output

vadmium · 2016-08-19T09:06:30Z

Serhiy’s two proposals won’t work for codecs that include non-zero output for zero input:

>>> tuple(iterencode((), "utf-8-sig"))
(b'\xef\xbb\xbf',)
>>> encode(b"", "uu")
b'begin 666 <data>\n \nend\n'
>>> encode(b"", "zlib")
b'x\x9c\x03\x00\x00\x00\x00\x01'

However I agree that changing the incremental codec APIs is not ideal. Since nobody seems to care that much, it might be simpler to document that:

iterencode() only works where text str objects can be encoded, so base64-codec is not supported, but rot13-codec is supported
iterdecode() only works where bytes objects can be decoded, so rot13-codec is not supported, but base64-codec should be supported (pending other aspects of bpo-20132)

vadmium · 2016-08-20T09:22:54Z

Here is my documentation proposal.

serhiy-storchaka · 2016-08-20T10:25:41Z

it might be simpler to document that

Agreed.

python-dev · 2016-10-15T01:05:11Z

New changeset 402eba63650c by Martin Panter in branch '3.5':
Issue bpo-23231: Document codecs.iterencode(), iterdecode() shortcomings
https://hg.python.org/cpython/rev/402eba63650c

New changeset 0837940bcb9f by Martin Panter in branch '3.6':
Issue bpo-23231: Merge codecs doc from 3.5 into 3.6
https://hg.python.org/cpython/rev/0837940bcb9f

New changeset 1955dcc27332 by Martin Panter in branch 'default':
Issue bpo-23231: Merge codecs doc from 3.6
https://hg.python.org/cpython/rev/1955dcc27332

vadmium added stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 13, 2015

serhiy-storchaka self-assigned this Feb 28, 2015

vadmium added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Aug 20, 2016

serhiy-storchaka assigned vadmium and unassigned serhiy-storchaka Aug 20, 2016

vadmium added the 3.7 (EOL) end of life label Oct 15, 2016

vadmium closed this as completed Oct 15, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix codecs.iterencode/decode() by allowing data parameter to be omitted #67420

Fix codecs.iterencode/decode() by allowing data parameter to be omitted #67420

vadmium commented Jan 13, 2015

vadmium commented Jan 13, 2015

vadmium commented Jan 13, 2015

vadmium commented Jan 18, 2015

serhiy-storchaka commented Dec 19, 2015

vadmium commented Aug 19, 2016

vadmium commented Aug 20, 2016

serhiy-storchaka commented Aug 20, 2016

python-dev mannequin commented Oct 15, 2016

Fix codecs.iterencode/decode() by allowing data parameter to be omitted #67420

Fix codecs.iterencode/decode() by allowing data parameter to be omitted #67420

Comments

vadmium commented Jan 13, 2015

vadmium commented Jan 13, 2015

vadmium commented Jan 13, 2015

vadmium commented Jan 18, 2015

serhiy-storchaka commented Dec 19, 2015

vadmium commented Aug 19, 2016

vadmium commented Aug 20, 2016

serhiy-storchaka commented Aug 20, 2016

python-dev mannequin commented Oct 15, 2016