This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author doerwalter
Recipients doerwalter, lemburg, loewis, serhiy.storchaka
Date 2014-01-31.14:18:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1391177923.01.0.293405371011.issue20420@psf.upfronthosting.co.za>
In-reply-to
Content
I dug up an ancient email about that subject:

>>> However, I've discovered that BufferedIncrementalEncoder.getstate()
>>> doesn't match the specification (i.e. it returns the buffer, not an
>>> int). However this class is unused (and probably useless, because it
>>> doesn't make sense to delay encoding the input). The simplest solution
>>> would be to simply drop the class.
>>
>> Sounds like a plan; go right ahead!
>
> Oops, there *is* one codec that uses it: The idna encoder. It buffers
> the input until a '.' is encountered (or encode() is called with
> final==True) and then encodes this part.
>
> Either the idna encoder encodes the unencoded input as a int, or we drop
> the specification that encoder.getstate() must return an int, or we
> change it to mirror the decoder specification (i.e. return a
> (buffered_input, additional_state_info) tuple.
>
> (A more radical solution would be to completely drop the incremental
> codecs for idna).
>
> Maybe we should wait and see how the implementation of writing turns out?

And indeed the incremental encoder for idna behaves strange:

>>> import io
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.tell()
0
>>> b.getvalue()
b''
>>> s.write('.')
1
>>> s.tell()
2
>>> b.getvalue()
b'x.'
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.seek(s.tell())
0
>>> s.write('.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode
    (result, consumed) = self._buffer_encode(data, self.errors, final)
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode
    result.extend(ToASCII(label))
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state.

However I don't know what changes this would require in the seek/tell imlementations.
History
Date User Action Args
2014-01-31 14:18:43doerwaltersetrecipients: + doerwalter, lemburg, loewis, serhiy.storchaka
2014-01-31 14:18:43doerwaltersetmessageid: <1391177923.01.0.293405371011.issue20420@psf.upfronthosting.co.za>
2014-01-31 14:18:42doerwalterlinkissue20420 messages
2014-01-31 14:18:42doerwaltercreate