Message 234104 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	martin.panter
Recipients	doerwalter, lemburg, loewis, martin.panter, ncoghlan, vstinner
Date	2015-01-16.00:28:31
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1421368111.56.0.757494195506.issue20132@psf.upfronthosting.co.za>
In-reply-to

Content
My “master plan” is basically to make most of the bytes-to-bytes codecs work as documented in the incremental (stateful) modes. I’m less interested in fixing the text codecs, and the quopri and uu codecs might be too hard, so I was going to propose some documentation warnings for those. If you have a suggestion on how to go about this better, let me know. With my doc change to StreamReader, I wanted to document the different modes that I saw in the base codecs.StreamReader.read() implementation: * read() or read(-1) reads everything * read(size) returns an arbitrary amount of data * read(size, chars) returns exactly chars length of data (unless EOF or similar) Previously the case of read(-1, chars) was ambiguous. Also I did not find the description “an approximate maximum number of decoded bytes” very helpful, considering more than this maximum was read if necessary to get enough chars. Regarding the end-of-stream behaviour, I made an assumption but I now realize it was wrong. Experimenting with the UTF-8 codec shows that its StreamReader.read() keeps returning "" when the underlying stream returns no data. But if it was in the middle of a multi-byte sequence, no “end of data” error is raised, and the multi-byte sequence can be completed if the underlying stream later returns more data. I think the lack of end-of-data checking should be documented. The different cases of ValueError and UnicodeError that you describe make sense. I think the various references to ValueError and UnicodeError should be updated (or replaced with pointers) to match.

My “master plan” is basically to make most of the bytes-to-bytes codecs work as documented in the incremental (stateful) modes. I’m less interested in fixing the text codecs, and the quopri and uu codecs might be too hard, so I was going to propose some documentation warnings for those.

If you have a suggestion on how to go about this better, let me know.

With my doc change to StreamReader, I wanted to document the different modes that I saw in the base codecs.StreamReader.read() implementation:

* read() or read(-1) reads everything
* read(size) returns an arbitrary amount of data
* read(size, chars) returns exactly *chars* length of data (unless EOF or similar)

Previously the case of read(-1, chars) was ambiguous. Also I did not find the description “an approximate maximum number of decoded bytes” very helpful, considering more than this maximum was read if necessary to get enough *chars*.

Regarding the end-of-stream behaviour, I made an assumption but I now realize it was wrong. Experimenting with the UTF-8 codec shows that its StreamReader.read() keeps returning "" when the underlying stream returns no data. But if it was in the middle of a multi-byte sequence, no “end of data” error is raised, and the multi-byte sequence can be completed if the underlying stream later returns more data. I think the lack of end-of-data checking should be documented.

The different cases of ValueError and UnicodeError that you describe make sense. I think the various references to ValueError and UnicodeError should be updated (or replaced with pointers) to match.

History
Date	User	Action	Args
2015-01-16 00:28:31	martin.panter	set	recipients: + martin.panter, lemburg, loewis, doerwalter, ncoghlan, vstinner
2015-01-16 00:28:31	martin.panter	set	messageid: <1421368111.56.0.757494195506.issue20132@psf.upfronthosting.co.za>
2015-01-16 00:28:31	martin.panter	link	issue20132 messages
2015-01-16 00:28:31	martin.panter	create