Message 167754 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	Arfrever, ishimoto, loewis, methane, mrabarnett, ncoghlan, pitrou, rurpy2, serhiy.storchaka, vstinner
Date	2012-08-09.02:08:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1344478099.91.0.0886645207953.issue15216@psf.upfronthosting.co.za>
In-reply-to

Content
To bring back Victor's comments from the list: - stdout/stderr are fairly easy to handle, since the underlying buffers can be flushed before switching the encoding and error settings. Yes, there's a risk of creating mojibake, but that's unavoidable and, for this use case, trumped by the pragmatic need to support overriding the output encoding in a robust fashion (i.e. not breaking sys.__stdout__ or sys.__stderr__, and not crashing if something else displays output during startup, for example, when running under "python -v") - stdin is more challenging, since it isn't entirely clear yet how to handle the case where data is already buffered internally. Victor proposes that it's acceptable to simply disallow changing the encoding of a stream that isn't seekable. My feeling is that such a restriction would largely miss the point, since the original use case that prompted the creation of this was shell pipeline processing, where stdin will often be a PIPE I think the guiding use case here really needs to be this one: "How do I implement the equivalent of 'iconv' as a Python 3 script, without breaking internal interpreter state invariants?" My current thought is that, instead of seeking, the input case can better be handled by manipulating the read ahead buffer directly. Something like (for the pure Python version): self._encoding = new_encoding if self._decoder is not None: old_data = self._get_decoded_chars().encode(old_encoding) old_data += self._decoder.getstate()[0] decoder = self._get_decoder() new_chars = '' if old_data: new_chars = decoder.decode(old_data) self._set_decoded_chars(new_chars) (A similar mechanism could actually be used to support an "initial_data" parameter to TextIOWrapper, which would help in general encoding detection situations where changing encoding in-place isn't needed, but the application would like an easy way to "put back" the initial data for inclusion in the text stream without making assumptions about the underlying buffer implementation) Also, StringIO should implement this new API as a no-op.

To bring back Victor's comments from the list:

- stdout/stderr are fairly easy to handle, since the underlying buffers can be flushed before switching the encoding and error settings. Yes, there's a risk of creating mojibake, but that's unavoidable and, for this use case, trumped by the pragmatic need to support overriding the output encoding in a robust fashion (i.e. not breaking sys.__stdout__ or sys.__stderr__, and not crashing if something else displays output during startup, for example, when running under "python -v")

- stdin is more challenging, since it isn't entirely clear yet how to handle the case where data is already buffered internally. Victor proposes that it's acceptable to simply disallow changing the encoding of a stream that isn't seekable. My feeling is that such a restriction would largely miss the point, since the original use case that prompted the creation of this was shell pipeline processing, where stdin will often be a PIPE

I think the guiding use case here really needs to be this one: "How do I implement the equivalent of 'iconv' as a Python 3 script, without breaking internal interpreter state invariants?"

My current thought is that, instead of seeking, the input case can better be handled by manipulating the read ahead buffer directly. Something like (for the pure Python version):

   self._encoding = new_encoding
   if self._decoder is not None:
     old_data = self._get_decoded_chars().encode(old_encoding)
     old_data += self._decoder.getstate()[0]
     decoder = self._get_decoder()
     new_chars = ''
     if old_data:
         new_chars = decoder.decode(old_data)
     self._set_decoded_chars(new_chars)

(A similar mechanism could actually be used to support an "initial_data" parameter to TextIOWrapper, which would help in general encoding detection situations where changing encoding *in-place* isn't needed, but the application would like an easy way to "put back" the initial data for inclusion in the text stream without making assumptions about the underlying buffer implementation)

Also, StringIO should implement this new API as a no-op.

History
Date	User	Action	Args
2012-08-09 02:08:20	ncoghlan	set	recipients: + ncoghlan, loewis, ishimoto, pitrou, vstinner, mrabarnett, Arfrever, methane, rurpy2, serhiy.storchaka
2012-08-09 02:08:19	ncoghlan	set	messageid: <1344478099.91.0.0886645207953.issue15216@psf.upfronthosting.co.za>
2012-08-09 02:08:19	ncoghlan	link	issue15216 messages
2012-08-09 02:08:17	ncoghlan	create