Message 209791 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	doerwalter
Recipients	doerwalter, lemburg, loewis, serhiy.storchaka
Date	2014-01-31.14:18:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1391177923.01.0.293405371011.issue20420@psf.upfronthosting.co.za>
In-reply-to

Content
I dug up an ancient email about that subject: >>> However, I've discovered that BufferedIncrementalEncoder.getstate() >>> doesn't match the specification (i.e. it returns the buffer, not an >>> int). However this class is unused (and probably useless, because it >>> doesn't make sense to delay encoding the input). The simplest solution >>> would be to simply drop the class. >> >> Sounds like a plan; go right ahead! > > Oops, there is one codec that uses it: The idna encoder. It buffers > the input until a '.' is encountered (or encode() is called with > final==True) and then encodes this part. > > Either the idna encoder encodes the unencoded input as a int, or we drop > the specification that encoder.getstate() must return an int, or we > change it to mirror the decoder specification (i.e. return a > (buffered_input, additional_state_info) tuple. > > (A more radical solution would be to completely drop the incremental > codecs for idna). > > Maybe we should wait and see how the implementation of writing turns out? And indeed the incremental encoder for idna behaves strange: >>> import io >>> b = io.BytesIO() >>> s = io.TextIOWrapper(b, 'idna') >>> s.write('x') 1 >>> s.tell() 0 >>> b.getvalue() b'' >>> s.write('.') 1 >>> s.tell() 2 >>> b.getvalue() b'x.' >>> b = io.BytesIO() >>> s = io.TextIOWrapper(b, 'idna') >>> s.write('x') 1 >>> s.seek(s.tell()) 0 >>> s.write('.') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode (result, consumed) = self._buffer_encode(data, self.errors, final) File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode result.extend(ToASCII(label)) File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII raise UnicodeError("label empty or too long") UnicodeError: label empty or too long The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state. However I don't know what changes this would require in the seek/tell imlementations.

I dug up an ancient email about that subject:

>>> However, I've discovered that BufferedIncrementalEncoder.getstate()
>>> doesn't match the specification (i.e. it returns the buffer, not an
>>> int). However this class is unused (and probably useless, because it
>>> doesn't make sense to delay encoding the input). The simplest solution
>>> would be to simply drop the class.
>>
>> Sounds like a plan; go right ahead!
>
> Oops, there *is* one codec that uses it: The idna encoder. It buffers
> the input until a '.' is encountered (or encode() is called with
> final==True) and then encodes this part.
>
> Either the idna encoder encodes the unencoded input as a int, or we drop
> the specification that encoder.getstate() must return an int, or we
> change it to mirror the decoder specification (i.e. return a
> (buffered_input, additional_state_info) tuple.
>
> (A more radical solution would be to completely drop the incremental
> codecs for idna).
>
> Maybe we should wait and see how the implementation of writing turns out?

And indeed the incremental encoder for idna behaves strange:

>>> import io
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.tell()
0
>>> b.getvalue()
b''
>>> s.write('.')
1
>>> s.tell()
2
>>> b.getvalue()
b'x.'
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.seek(s.tell())
0
>>> s.write('.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode
    (result, consumed) = self._buffer_encode(data, self.errors, final)
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode
    result.extend(ToASCII(label))
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state.

However I don't know what changes this would require in the seek/tell imlementations.

History
Date	User	Action	Args
2014-01-31 14:18:43	doerwalter	set	recipients: + doerwalter, lemburg, loewis, serhiy.storchaka
2014-01-31 14:18:43	doerwalter	set	messageid: <1391177923.01.0.293405371011.issue20420@psf.upfronthosting.co.za>
2014-01-31 14:18:42	doerwalter	link	issue20420 messages
2014-01-31 14:18:42	doerwalter	create