Author martin.panter
Recipients doerwalter, lemburg, loewis, martin.panter, ncoghlan, vstinner
Date 2015-01-15.22:46:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1421361969.03.0.433840890948.issue20132@psf.upfronthosting.co.za>
In-reply-to
Content
I opened Issue 23231 about fixing iterencode() and iterdecode() in the general case. I added a patch to Issue 13881 to fix StreamWriter for zlib and bz2, and to fix StreamWriter.writelines() in general.

I am adding a patch here to clarify the StreamReader API and fix the StreamReader for the zlib-codec.

Plan of other things to do:

bz2 StreamReader: Should be implemented similar to the zlib patch above, after Issue 15955 is resolved and we have a max_length parameter to use. Or could be based on Bz2File now.

hex decoder: Shouldn’t be too hard to hack a stateful IncrementalDecoder that saves the leftover digit if given an odd number of digits. Create a generic codecs._IncrementalStreamReader class that uses an IncrementalDecoder and buffers unread decoded data, similar to my _IncrementalStreamWriter for Issue 13881.

base64 encoder: IncrementalEncoder could encode in base64.MAXBINSIZE chunks

base64 decoder: IncrementalDecoder could strip non-alphabet characters using regular expressions, decode in multiples of four characters

quopri encoder: would require new implementation or major refactoring of quopri module

quopri decoder: check for incomplete trailing escape codes (=, =<single hex digit>, =\r) and newlines (\r)

uu encoder: write header and trailer via uu module; encode using b2a_uu()

uu decoder: factor out header parsing from uu module; buffer and decode line by line based on encoded length

unicode-escape, raw-unicode-escape: Stateful decoding would probably require a new -Stateful() function at the C level, though it might be easy to build from the existing implementation. I suggest documenting that stateful decoding is not supported for the time being.

utf-7: As Walter said, proper stateful codec is not supported by the C API, despite PyUnicode_DecodeUTF7Stateful(); doing so would probably require significant changes. I suggest documenting a warning for stateful mode (including with TextIOWrapper) about suboptimal encoding and unlimited data buffering for the time being.

punycode, unicode_internal: According to test_codecs.py, these also don’t work in stateful mode. Not sure on the details though.
History
Date User Action Args
2015-01-15 22:46:10martin.pantersetrecipients: + martin.panter, lemburg, loewis, doerwalter, ncoghlan, vstinner
2015-01-15 22:46:09martin.pantersetmessageid: <1421361969.03.0.433840890948.issue20132@psf.upfronthosting.co.za>
2015-01-15 22:46:09martin.panterlinkissue20132 messages
2015-01-15 22:46:08martin.pantercreate