Message 172372 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Marcus.Gröber
Recipients	Marcus.Gröber, lovelylain
Date	2012-10-08.11:19:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1349695165.83.0.945675667383.issue15278@psf.upfronthosting.co.za>
In-reply-to

Content
I came across this today as well. A short way of summarizing this error seems to be: Reading a file using readline (or "for line in file") fails, if the following two conditions are true: • A codec (e.g. UTF-8) for a multi-byte encoding is used, and • The first line of the file is at least 73 bytes long, and contains a multi-byte-sequence that starts before offset 72, and ends after offset 72 At least for UTF-8 input files, it may be possible to work around this by opening the input file without a codec, and then applying decode("utf-8") to each line.

I came across this today as well. A short way of summarizing this error seems to be:

Reading a file using readline (or "for line in file") fails, if the following two conditions are true:

•	A codec (e.g. UTF-8) for a multi-byte encoding is used, and
•	The first line of the file is at least 73 bytes long, and contains a multi-byte-sequence that starts before offset 72, and ends after offset 72

At least for UTF-8 input files, it may be possible to work around this by opening the input file without a codec, and then applying decode("utf-8") to each line.

History
Date	User	Action	Args
2012-10-08 11:19:25	Marcus.Gröber	set	recipients: + Marcus.Gröber, lovelylain
2012-10-08 11:19:25	Marcus.Gröber	set	messageid: <1349695165.83.0.945675667383.issue15278@psf.upfronthosting.co.za>
2012-10-08 11:19:25	Marcus.Gröber	link	issue15278 messages
2012-10-08 11:19:25	Marcus.Gröber	create