This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author warner
Recipients warner
Date 2010-11-09.01:05:11
SpamBayes Score 0.0005588842
Marked as misclassified No
Message-id <1289264715.81.0.326279765732.issue10370@psf.upfronthosting.co.za>
In-reply-to
Content
I noticed that the UnicodeDecodeError exception produced by trying to do open(fn).readlines() (i.e. using the default ASCII encoding) on a file that's actually UTF-8 reports the wrong offset for the first undecodeable character. From what I can tell, it reports (offset%4096) instead of the actual offset.

I've attached a test case. It emits "all good" when run against py2.x (well, after converting the print() expressions back into statements), but reports an error at offset 4096 (reported as "0") on py3.1.2 and py3.2a3 . I'm running on a debian (sid) x86 box.

The misreported offset does not occur with read(), just with readlines().
History
Date User Action Args
2010-11-09 01:05:16warnersetrecipients: + warner
2010-11-09 01:05:15warnersetmessageid: <1289264715.81.0.326279765732.issue10370@psf.upfronthosting.co.za>
2010-11-09 01:05:13warnerlinkissue10370 messages
2010-11-09 01:05:12warnercreate