This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author martin.panter
Recipients Arfrever, berker.peksag, ishimoto, jwilk, loewis, martin.panter, methane, mrabarnett, ncoghlan, nikratio, pitrou, quad, rurpy2, serhiy.storchaka, vstinner
Date 2017-01-11.10:05:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1484129157.94.0.477811292309.issue15216@psf.upfronthosting.co.za>
In-reply-to
Content
Inada, I think you messed up the positioning of bits of the patch. E.g. there are now test methods declared inside a helper function (rather than a test class).

Since it seems other people are in favour of this API, I would like to expand it a bit to cover two uses cases (see set_encoding-newline.patch):

* change the error handler without affecting the main character encoding
* set the newline encoding (also suggested by Serhiy)

Regarding Serhiy’s other suggestion about buffering parameters, perhaps TextIOWrapper.line_buffering could become a writable attribute instead, and the class could grow a similar write_through attribute. I don’t think these affect encoding or decoding, so they could be treated independently.

The algorithm for rewinding unread data is complicated and can fail. What is the advantage of using it? What is the use case for reading from a stream and then changing the encoding, without a guarantee that it will work?

Even if it is enhanced to never “fail”, it will still have strange behaviour, such as data loss when a decoder is fed a single byte and produces multiple characters (e.g. CR newline, backslashreplace, UTF-7).

One step in the right direction IMO would be to only support calling set_encoding() when no extra read data has been buffered (or to explicitly say that any buffered data is silently dropped). So there is no support for changing the encoding halfway through a disk file, but it may be appropriate if you can regulate the bytes being read, e.g. from a terminal (user input), pipe, socket, etc.

But I would be happy enough without set_encoding(), and with something like my rewrap() function at the bottom of <https://github.com/vadmium/data/blob/master/data.py#L526>. It returns a fresh TextIOWrapper, but when you exit the context manager you can continue to reuse the old stream with the old settings.
History
Date User Action Args
2017-01-11 10:05:59martin.pantersetrecipients: + martin.panter, loewis, ishimoto, ncoghlan, pitrou, vstinner, jwilk, mrabarnett, Arfrever, methane, nikratio, rurpy2, berker.peksag, serhiy.storchaka, quad
2017-01-11 10:05:57martin.pantersetmessageid: <1484129157.94.0.477811292309.issue15216@psf.upfronthosting.co.za>
2017-01-11 10:05:57martin.panterlinkissue15216 messages
2017-01-11 10:05:57martin.pantercreate