Message 285210 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	Arfrever, berker.peksag, ishimoto, jwilk, loewis, martin.panter, methane, mrabarnett, ncoghlan, nikratio, pitrou, quad, rurpy2, serhiy.storchaka, vstinner
Date	2017-01-11.10:35:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1484130916.32.0.662761708206.issue15216@psf.upfronthosting.co.za>
In-reply-to

Content
> Inada, I think you messed up the positioning of bits of the patch. E.g. there are now test methods declared > inside a helper function (rather than a test class). I'm sorry. `patch -p1` merged previous patch into wrong place, and test passed accidently. > Since it seems other people are in favour of this API, I would like to expand it a bit to cover two uses cases (see set_encoding-newline.patch): > > * change the error handler without affecting the main character encoding > * set the newline encoding (also suggested by Serhiy) +1. Since stdio is configured before running Python program, TextIOWrapper should be configurable after creation, as possible. > Regarding Serhiy’s other suggestion about buffering parameters, perhaps TextIOWrapper.line_buffering could become a writable attribute instead, and the class could grow a similar write_through attribute. I don’t think these affect encoding or decoding, so they could be treated independently. Could them go another new issue? This issue is too long to read already. > The algorithm for rewinding unread data is complicated and can fail. What is the advantage of using it? What is the use case for reading from a stream and then changing the encoding, without a guarantee that it will work? > > Even if it is enhanced to never “fail”, it will still have strange behaviour, such as data loss when a decoder is fed a single byte and produces multiple characters (e.g. CR newline, backslashreplace, UTF-7). When I posted the set_encoding-7.patch, I hadn't read io module deeply. I just solved conflict and ran test. After that, I read the code and I feel same thing (see msg285111 and msg285112). Let's drop support changing encoding while reading. It's significant step that allowing changing stdin encoding only before reading anything from it. > One step in the right direction IMO would be to only support calling set_encoding() when no extra read data has been buffered (or to explicitly say that any buffered data is silently dropped). So there is no support for changing the encoding halfway through a disk file, but it may be appropriate if you can regulate the bytes being read, e.g. from a terminal (user input), pipe, socket, etc. Totally agree. > But I would be happy enough without set_encoding(), and with something like my rewrap() function at the bottom of <https://github.com/vadmium/data/blob/master/data.py#L526>. It returns a fresh TextIOWrapper, but when you exit the context manager you can continue to reuse the old stream with the old settings. I want one obvious way to control encoding and error handler from Python, (not from environment variable). Rewrapping stream seems hacky way, rather than obvious way.

> Inada, I think you messed up the positioning of bits of the patch. E.g. there are now test methods declared > inside a helper function (rather than a test class).

I'm sorry.  `patch -p1` merged previous patch into wrong place, and test passed accidently.

> Since it seems other people are in favour of this API, I would like to expand it a bit to cover two uses  cases (see set_encoding-newline.patch):
> 
> * change the error handler without affecting the main character encoding
> * set the newline encoding (also suggested by Serhiy)

+1.  Since stdio is configured before running Python program, TextIOWrapper should be configurable after creation, as possible.

> Regarding Serhiy’s other suggestion about buffering parameters, perhaps TextIOWrapper.line_buffering could become a writable attribute instead, and the class could grow a similar write_through attribute. I don’t think these affect encoding or decoding, so they could be treated independently.

Could them go another new issue?
This issue is too long to read already.

> The algorithm for rewinding unread data is complicated and can fail. What is the advantage of using it? What is the use case for reading from a stream and then changing the encoding, without a guarantee that it will work?
>
> Even if it is enhanced to never “fail”, it will still have strange behaviour, such as data loss when a decoder is fed a single byte and produces multiple characters (e.g. CR newline, backslashreplace, UTF-7).

When I posted the set_encoding-7.patch, I hadn't read io module deeply.  I just solved conflict and ran test.
After that, I read the code and I feel same thing (see msg285111 and msg285112).
Let's drop support changing encoding while reading.
It's significant step that allowing changing stdin encoding only before reading anything from it.


> One step in the right direction IMO would be to only support calling set_encoding() when no extra read data has been buffered (or to explicitly say that any buffered data is silently dropped). So there is no support for changing the encoding halfway through a disk file, but it may be appropriate if you can regulate the bytes being read, e.g. from a terminal (user input), pipe, socket, etc.

Totally agree.


> But I would be happy enough without set_encoding(), and with something like my rewrap() function at the bottom of <https://github.com/vadmium/data/blob/master/data.py#L526>. It returns a fresh TextIOWrapper, but when you exit the context manager you can continue to reuse the old stream with the old settings.

I want one obvious way to control encoding and error handler from Python, (not from environment variable).
Rewrapping stream seems hacky way, rather than obvious way.

History
Date	User	Action	Args
2017-01-11 10:35:16	methane	set	recipients: + methane, loewis, ishimoto, ncoghlan, pitrou, vstinner, jwilk, mrabarnett, Arfrever, nikratio, rurpy2, berker.peksag, martin.panter, serhiy.storchaka, quad
2017-01-11 10:35:16	methane	set	messageid: <1484130916.32.0.662761708206.issue15216@psf.upfronthosting.co.za>
2017-01-11 10:35:16	methane	link	issue15216 messages
2017-01-11 10:35:16	methane	create