Message 130498 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ply
Recipients	ply
Date	2011-03-10.10:19:57
SpamBayes Score	5.2878697e-08
Marked as misclassified	No
Message-id	<1299752398.22.0.888282532941.issue11461@psf.upfronthosting.co.za>
In-reply-to

Content
Reading UTF-16 text file with module 'codecs' fails, if surrogate pair is located at 72-character boundary. Attached python script fails with message: UnicodeDecodeError: 'utf16' codec can't decode bytes in position 70-71: unexpected end of data The reason is splitting of input data for readline() into chunks, namely readsize = size or 72

Reading UTF-16 text file with module 'codecs' fails, if surrogate pair is located at 72-character boundary.

Attached python script fails with message:
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 70-71: unexpected end of data

The reason is splitting of input data for readline() into chunks, namely
  readsize = size or 72

History
Date	User	Action	Args
2011-03-10 10:19:58	ply	set	recipients: + ply
2011-03-10 10:19:58	ply	set	messageid: <1299752398.22.0.888282532941.issue11461@psf.upfronthosting.co.za>
2011-03-10 10:19:57	ply	link	issue11461 messages
2011-03-10 10:19:57	ply	create