Message 77219 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sjmachin
Recipients	sjmachin
Date	2008-12-07.11:00:33
SpamBayes Score	3.8638614e-06
Marked as misclassified	No
Message-id	<1228647635.62.0.380202881754.issue4574@psf.upfronthosting.co.za>
In-reply-to

Content
Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the decoded chunk in case there's an LF at the start of the next chunk. It prepends b'\r' (only 1 byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16 takes 2 bytes; we are now 1 byte out of kilter and various failures are possible (including silently producing garbage output from a truncated file with an odd number of bytes). The attached script illustrates the problems.

Problem in the newline handling in io.py, class
IncrementalNewlineDecoder, method decode. It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the decoded chunk in case
there's an LF at the start of the next chunk. It prepends b'\r' (only 1
byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16
takes 2 bytes; we are now 1 byte out of kilter and various failures are
possible (including silently producing garbage output from a truncated
file with an odd number of bytes).

The attached script illustrates the problems.

History
Date	User	Action	Args
2008-12-07 11:00:35	sjmachin	set	recipients: + sjmachin
2008-12-07 11:00:35	sjmachin	set	messageid: <1228647635.62.0.380202881754.issue4574@psf.upfronthosting.co.za>
2008-12-07 11:00:34	sjmachin	link	issue4574 messages
2008-12-07 11:00:33	sjmachin	create