Message 207850 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	doerwalter
Recipients	doerwalter, lemburg, loewis, martin.panter, ncoghlan, vstinner
Date	2014-01-10.11:26:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1389353210.68.0.603563067958.issue20132@psf.upfronthosting.co.za>
In-reply-to

Content
The best solution IMHO would be to implement real incremental codecs for all of those. Maybe iterencode() with an empty iterator should never call encode()? (But IMHO it would be better to document that iterencode()/iterdecode() should only be used with "real" codecs.) Note that the comment before PyUnicode_DecodeUTF7Stateful() in unicodeobject.c reads: /* The decoder. The only state we preserve is our read position, * i.e. how many characters we have consumed. So if we end in the * middle of a shift sequence we have to back off the read position * and the output to the beginning of the sequence, otherwise we lose * all the shift state (seen bits, number of bits seen, high * surrogate). */ Changing that would have to introduce a state object that the codec updates and from which it can be restarted. Also the encoder does not buffer anything. To implement the suggested behaviour, the encoder might have to buffer unlimited data.

The best solution IMHO would be to implement real incremental codecs for all of those.

Maybe iterencode() with an empty iterator should never call encode()? (But IMHO it would be better to document that iterencode()/iterdecode() should only be used with "real" codecs.)

Note that the comment before PyUnicode_DecodeUTF7Stateful() in unicodeobject.c reads:

/* The decoder.  The only state we preserve is our read position,
 * i.e. how many characters we have consumed.  So if we end in the
 * middle of a shift sequence we have to back off the read position
 * and the output to the beginning of the sequence, otherwise we lose
 * all the shift state (seen bits, number of bits seen, high
 * surrogate). */

Changing that would have to introduce a state object that the codec updates and from which it can be restarted.

Also the encoder does not buffer anything. To implement the suggested behaviour, the encoder might have to buffer unlimited data.

History
Date	User	Action	Args
2014-01-10 11:26:50	doerwalter	set	recipients: + doerwalter, lemburg, loewis, ncoghlan, vstinner, martin.panter
2014-01-10 11:26:50	doerwalter	set	messageid: <1389353210.68.0.603563067958.issue20132@psf.upfronthosting.co.za>
2014-01-10 11:26:50	doerwalter	link	issue20132 messages
2014-01-10 11:26:49	doerwalter	create