Author nikratio
Recipients Arfrever, elixir, inada.naoki, ishimoto, jwilk, loewis, mrabarnett, ncoghlan, nikratio, pitrou, rurpy2, serhiy.storchaka, vstinner
Date 2014-01-29.03:46:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1390967214.06.0.469364259306.issue15216@psf.upfronthosting.co.za>
In-reply-to
Content
I'm about 40% done with translating Victor's patch into C. However, in the process I got the feeling that this approach may not be so good after all.

Note that:

 * The only use-case for set_encoding that I have found was changing the encoding of sys.{stdin,stdout,stderr}

 * When using non-seekable streams, set_encoding() has to be called before anything has been read from the stream, so it's unlikely that there is a situation (with the exception of sys.std*) where the stream cannot be opened with the right encoding instead (if you can't change the open call, then you probably cannot call set_encoding early enough either).

 * When using seekable streams, using set_encoding() breaks seeking, because the position cookie does not contain information about the decoder that was used at the given position. Example:

$ cat ~/tmp/test.py
import _pyio as io
data = ('0123456789\r'*5).encode('utf-16_le')
bstream = io.BytesIO(data)
tstream = io.TextIOWrapper(bstream, encoding='latin1')
tstream.readline()
pos = tstream.tell()
tstream.read(6)
tstream.set_encoding('utf-16_le')
tstream.seek(pos)

$ ./python ~/tmp/test.py 
Traceback (most recent call last):
  File "/home/nikratio/tmp/test.py", line 9, in <module>
    tstream.seek(pos)
  File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 1989, in seek
    raise OSError("can't restore logical file position")
OSError: can't restore logical file position


I don't think there is a way to fix that that would not make the whole tell/seek and set_encoding code even more complicated than it already is. (It would probably involve keeping track of the history of encoders that have been used for different parts of the stream).

In summary, using set_encoding() with seekable streams breaks seeking, using it with non-seekable streams requires it to be called right after open(), and the only reported case where one cannot simply change the open call instead is sys.std*.

Given all that, do we really want to add a new public method to the TextIOWrapper class that can only reasonably be used with three specific streams?


Personally, I think it would make much more sense to instead introduce three new functions in the sys module: sys.change_std{out,err,in}_encoding(). That solves the reported use-case just as well without polluting the namespace of all text streams.



That said, I am happy to complete the implementation set_encoding in C. However, I'd like a core developer to first reconfirm that this is really the better solution.
History
Date User Action Args
2014-01-29 03:46:54nikratiosetrecipients: + nikratio, loewis, ishimoto, ncoghlan, pitrou, vstinner, jwilk, mrabarnett, Arfrever, inada.naoki, rurpy2, serhiy.storchaka, elixir
2014-01-29 03:46:54nikratiosetmessageid: <1390967214.06.0.469364259306.issue15216@psf.upfronthosting.co.za>
2014-01-29 03:46:54nikratiolinkissue15216 messages
2014-01-29 03:46:52nikratiocreate