This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ply
Recipients ply
Date 2011-03-10.10:19:57
SpamBayes Score 5.2878697e-08
Marked as misclassified No
Message-id <1299752398.22.0.888282532941.issue11461@psf.upfronthosting.co.za>
In-reply-to
Content
Reading UTF-16 text file with module 'codecs' fails, if surrogate pair is located at 72-character boundary.

Attached python script fails with message:
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 70-71: unexpected end of data

The reason is splitting of input data for readline() into chunks, namely
  readsize = size or 72
History
Date User Action Args
2011-03-10 10:19:58plysetrecipients: + ply
2011-03-10 10:19:58plysetmessageid: <1299752398.22.0.888282532941.issue11461@psf.upfronthosting.co.za>
2011-03-10 10:19:57plylinkissue11461 messages
2011-03-10 10:19:57plycreate