Message 234099 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	doerwalter, lemburg, loewis, martin.panter, ncoghlan, vstinner
Date	2015-01-15.22:46:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1421361969.03.0.433840890948.issue20132@psf.upfronthosting.co.za>
In-reply-to

Content
I opened Issue 23231 about fixing iterencode() and iterdecode() in the general case. I added a patch to Issue 13881 to fix StreamWriter for zlib and bz2, and to fix StreamWriter.writelines() in general. I am adding a patch here to clarify the StreamReader API and fix the StreamReader for the zlib-codec. Plan of other things to do: bz2 StreamReader: Should be implemented similar to the zlib patch above, after Issue 15955 is resolved and we have a max_length parameter to use. Or could be based on Bz2File now. hex decoder: Shouldn’t be too hard to hack a stateful IncrementalDecoder that saves the leftover digit if given an odd number of digits. Create a generic codecs._IncrementalStreamReader class that uses an IncrementalDecoder and buffers unread decoded data, similar to my _IncrementalStreamWriter for Issue 13881. base64 encoder: IncrementalEncoder could encode in base64.MAXBINSIZE chunks base64 decoder: IncrementalDecoder could strip non-alphabet characters using regular expressions, decode in multiples of four characters quopri encoder: would require new implementation or major refactoring of quopri module quopri decoder: check for incomplete trailing escape codes (=, =<single hex digit>, =\r) and newlines (\r) uu encoder: write header and trailer via uu module; encode using b2a_uu() uu decoder: factor out header parsing from uu module; buffer and decode line by line based on encoded length unicode-escape, raw-unicode-escape: Stateful decoding would probably require a new -Stateful() function at the C level, though it might be easy to build from the existing implementation. I suggest documenting that stateful decoding is not supported for the time being. utf-7: As Walter said, proper stateful codec is not supported by the C API, despite PyUnicode_DecodeUTF7Stateful(); doing so would probably require significant changes. I suggest documenting a warning for stateful mode (including with TextIOWrapper) about suboptimal encoding and unlimited data buffering for the time being. punycode, unicode_internal: According to test_codecs.py, these also don’t work in stateful mode. Not sure on the details though.

I opened Issue 23231 about fixing iterencode() and iterdecode() in the general case. I added a patch to Issue 13881 to fix StreamWriter for zlib and bz2, and to fix StreamWriter.writelines() in general.

I am adding a patch here to clarify the StreamReader API and fix the StreamReader for the zlib-codec.

Plan of other things to do:

bz2 StreamReader: Should be implemented similar to the zlib patch above, after Issue 15955 is resolved and we have a max_length parameter to use. Or could be based on Bz2File now.

hex decoder: Shouldn’t be too hard to hack a stateful IncrementalDecoder that saves the leftover digit if given an odd number of digits. Create a generic codecs._IncrementalStreamReader class that uses an IncrementalDecoder and buffers unread decoded data, similar to my _IncrementalStreamWriter for Issue 13881.

base64 encoder: IncrementalEncoder could encode in base64.MAXBINSIZE chunks

base64 decoder: IncrementalDecoder could strip non-alphabet characters using regular expressions, decode in multiples of four characters

quopri encoder: would require new implementation or major refactoring of quopri module

quopri decoder: check for incomplete trailing escape codes (=, =<single hex digit>, =\r) and newlines (\r)

uu encoder: write header and trailer via uu module; encode using b2a_uu()

uu decoder: factor out header parsing from uu module; buffer and decode line by line based on encoded length

unicode-escape, raw-unicode-escape: Stateful decoding would probably require a new -Stateful() function at the C level, though it might be easy to build from the existing implementation. I suggest documenting that stateful decoding is not supported for the time being.

utf-7: As Walter said, proper stateful codec is not supported by the C API, despite PyUnicode_DecodeUTF7Stateful(); doing so would probably require significant changes. I suggest documenting a warning for stateful mode (including with TextIOWrapper) about suboptimal encoding and unlimited data buffering for the time being.

punycode, unicode_internal: According to test_codecs.py, these also don’t work in stateful mode. Not sure on the details though.

History
Date	User	Action	Args
2015-01-15 22:46:10	martin.panter	set	recipients: + martin.panter, lemburg, loewis, doerwalter, ncoghlan, vstinner
2015-01-15 22:46:09	martin.panter	set	messageid: <1421361969.03.0.433840890948.issue20132@psf.upfronthosting.co.za>
2015-01-15 22:46:09	martin.panter	link	issue20132 messages
2015-01-15 22:46:08	martin.panter	create