Author ncoghlan
Recipients Arfrever, inada.naoki, ishimoto, loewis, mrabarnett, ncoghlan, pitrou, rurpy2, serhiy.storchaka, vstinner
Date 2012-08-09.02:08:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1344478099.91.0.0886645207953.issue15216@psf.upfronthosting.co.za>
In-reply-to
Content
To bring back Victor's comments from the list:

- stdout/stderr are fairly easy to handle, since the underlying buffers can be flushed before switching the encoding and error settings. Yes, there's a risk of creating mojibake, but that's unavoidable and, for this use case, trumped by the pragmatic need to support overriding the output encoding in a robust fashion (i.e. not breaking sys.__stdout__ or sys.__stderr__, and not crashing if something else displays output during startup, for example, when running under "python -v")

- stdin is more challenging, since it isn't entirely clear yet how to handle the case where data is already buffered internally. Victor proposes that it's acceptable to simply disallow changing the encoding of a stream that isn't seekable. My feeling is that such a restriction would largely miss the point, since the original use case that prompted the creation of this was shell pipeline processing, where stdin will often be a PIPE

I think the guiding use case here really needs to be this one: "How do I implement the equivalent of 'iconv' as a Python 3 script, without breaking internal interpreter state invariants?"

My current thought is that, instead of seeking, the input case can better be handled by manipulating the read ahead buffer directly. Something like (for the pure Python version):

   self._encoding = new_encoding
   if self._decoder is not None:
     old_data = self._get_decoded_chars().encode(old_encoding)
     old_data += self._decoder.getstate()[0]
     decoder = self._get_decoder()
     new_chars = ''
     if old_data:
         new_chars = decoder.decode(old_data)
     self._set_decoded_chars(new_chars)

(A similar mechanism could actually be used to support an "initial_data" parameter to TextIOWrapper, which would help in general encoding detection situations where changing encoding *in-place* isn't needed, but the application would like an easy way to "put back" the initial data for inclusion in the text stream without making assumptions about the underlying buffer implementation)

Also, StringIO should implement this new API as a no-op.
History
Date User Action Args
2012-08-09 02:08:20ncoghlansetrecipients: + ncoghlan, loewis, ishimoto, pitrou, vstinner, mrabarnett, Arfrever, inada.naoki, rurpy2, serhiy.storchaka
2012-08-09 02:08:19ncoghlansetmessageid: <1344478099.91.0.0886645207953.issue15216@psf.upfronthosting.co.za>
2012-08-09 02:08:19ncoghlanlinkissue15216 messages
2012-08-09 02:08:17ncoghlancreate