Issue 20420: BufferedIncrementalEncoder violates IncrementalEncoder interface

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64619

classification

Title:	BufferedIncrementalEncoder violates IncrementalEncoder interface
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	doerwalter, lemburg, loewis, martin.panter, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2014-01-28 16:34 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin.

Messages (4)
msg209563 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-01-28 16:34
The documentation of IncrementalEncoder.getstate() says: """ Return the current state of the encoder which must be an integer. The implementation should make sure that 0 is the most common state. (States that are more complicated than integers can be converted into an integer by marshaling/pickling the state and encoding the bytes of the resulting string into an integer). """ But implementation of BufferedIncrementalEncoder.getstate() is def getstate(self): return self.buffer or 0 self.buffer is "unencoded input that is kept between calls to encode()", e.g. a string.
msg209791 - (view)	Author: Walter Dörwald (doerwalter) *	Date: 2014-01-31 14:18
I dug up an ancient email about that subject: >>> However, I've discovered that BufferedIncrementalEncoder.getstate() >>> doesn't match the specification (i.e. it returns the buffer, not an >>> int). However this class is unused (and probably useless, because it >>> doesn't make sense to delay encoding the input). The simplest solution >>> would be to simply drop the class. >> >> Sounds like a plan; go right ahead! > > Oops, there is one codec that uses it: The idna encoder. It buffers > the input until a '.' is encountered (or encode() is called with > final==True) and then encodes this part. > > Either the idna encoder encodes the unencoded input as a int, or we drop > the specification that encoder.getstate() must return an int, or we > change it to mirror the decoder specification (i.e. return a > (buffered_input, additional_state_info) tuple. > > (A more radical solution would be to completely drop the incremental > codecs for idna). > > Maybe we should wait and see how the implementation of writing turns out? And indeed the incremental encoder for idna behaves strange: >>> import io >>> b = io.BytesIO() >>> s = io.TextIOWrapper(b, 'idna') >>> s.write('x') 1 >>> s.tell() 0 >>> b.getvalue() b'' >>> s.write('.') 1 >>> s.tell() 2 >>> b.getvalue() b'x.' >>> b = io.BytesIO() >>> s = io.TextIOWrapper(b, 'idna') >>> s.write('x') 1 >>> s.seek(s.tell()) 0 >>> s.write('.') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode (result, consumed) = self._buffer_encode(data, self.errors, final) File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode result.extend(ToASCII(label)) File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII raise UnicodeError("label empty or too long") UnicodeError: label empty or too long The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state. However I don't know what changes this would require in the seek/tell imlementations.
msg222473 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-07-07 16:47
IncrementalNewlineDecoder requires that decoder state is integer (C implementation requires at most 63-bit unsigned integer). TextIOWrapper requires that decoder state is at most 64-bit unsigned integer (only 63-bit if universal newlines is enabled).
msg234164 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-17 11:12
For what it’s worth, both io.TextIOWrapper and _pyio.TextIOWrapper appear to only ever call IncrementalEncoder.setstate(0). And the newline _decoder_ is not relevant because it doesn’t use any _encoder_.

History
Date	User	Action	Args
2022-04-11 14:57:57	admin	set	github: 64619
2021-12-09 22:10:24	iritkatriel	set	components: + Library (Lib) versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.3, Python 3.4
2015-01-17 11:12:55	martin.panter	set	nosy: + martin.panter messages: + msg234164
2014-07-07 16:47:41	serhiy.storchaka	set	messages: + msg222473
2014-01-31 14:18:42	doerwalter	set	messages: + msg209791
2014-01-28 16:34:45	serhiy.storchaka	create