Message 120830 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	warner
Recipients	warner
Date	2010-11-09.01:05:11
SpamBayes Score	0.0005588842
Marked as misclassified	No
Message-id	<1289264715.81.0.326279765732.issue10370@psf.upfronthosting.co.za>
In-reply-to

Content
I noticed that the UnicodeDecodeError exception produced by trying to do open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's actually UTF-8 reports the wrong offset for the first undecodeable character. From what I can tell, it reports (offset%4096) instead of the actual offset. I've attached a test case. It emits "all good" when run against py2.x (well, after converting the print() expressions back into statements), but reports an error at offset 4096 (reported as "0") on py3.1.2 and py3.2a3 . I'm running on a debian (sid) x86 box. The misreported offset does not occur with read(), just with readlines().

I noticed that the UnicodeDecodeError exception produced by trying to do open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's actually UTF-8 reports the wrong offset for the first undecodeable character. From what I can tell, it reports (offset%4096) instead of the actual offset.

I've attached a test case. It emits "all good" when run against py2.x (well, after converting the print() expressions back into statements), but reports an error at offset 4096 (reported as "0") on py3.1.2 and py3.2a3 . I'm running on a debian (sid) x86 box.

The misreported offset does not occur with read(), just with readlines().

History
Date	User	Action	Args
2010-11-09 01:05:16	warner	set	recipients: + warner
2010-11-09 01:05:15	warner	set	messageid: <1289264715.81.0.326279765732.issue10370@psf.upfronthosting.co.za>
2010-11-09 01:05:13	warner	link	issue10370 messages
2010-11-09 01:05:12	warner	create