Author ncoghlan
Recipients Arfrever, berker.peksag, inada.naoki, ishimoto, jwilk, loewis, martin.panter, mrabarnett, ncoghlan, nikratio, pitrou, quad, rurpy2, serhiy.storchaka, vstinner
Date 2017-01-08.03:23:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1483845808.33.0.1652145511.issue15216@psf.upfronthosting.co.za>
In-reply-to
Content
Reviewing Inada-san's latest version of the patch, we seem to be in a somewhat hybrid state where:

1. The restriction to only being used with seekable() streams if there is currently unread data in the read buffer is in place

2. We don't actually call seek() anywhere to set the stream back to the beginning of the file. Instead, we try to shuffle data out of the old decoder and into the new one.

I'm starting to wonder if the best option here might be to attempt to make the API work for arbitrary codecs and non-seekable streams, and then simply accept that it may take a few maintenance releases before that's actually true. If we decide to go down that path, then I'd suggest the follow stress test:

- make a longish test string out of repeated copies of "ℙƴ☂ℌøἤ"
- pick a few pairs of multibyte non-universal/universal encodings for use with surrogateescape and strict as their respective error handlers (e.g. ascii/utf8, ascii/utf16le, ascii/utf32, ascii/shift_jis, ascii/iso2022_jp, ascii/gb18030, gbk/gb18030)
- for each pair, make the test data by encoding from str to bytes with the relevant universal encoding
- switch the encoding multiple times on the same stream at different points

Optionally:

- extract "codecs._switch_decoder" and "codecs._switch_encoder" helper functions to make this all a bit easier to test and debug (with a Python version in the codecs module and the C version accessible via the _codecs modules)

That way, confidence in the reliability of the feature (including across Python implementations) can be based on the strength of the test cases covering it.
History
Date User Action Args
2017-01-08 03:23:28ncoghlansetrecipients: + ncoghlan, loewis, ishimoto, pitrou, vstinner, jwilk, mrabarnett, Arfrever, inada.naoki, nikratio, rurpy2, berker.peksag, martin.panter, serhiy.storchaka, quad
2017-01-08 03:23:28ncoghlansetmessageid: <1483845808.33.0.1652145511.issue15216@psf.upfronthosting.co.za>
2017-01-08 03:23:28ncoghlanlinkissue15216 messages
2017-01-08 03:23:27ncoghlancreate