This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jcope
Recipients jcope
Date 2010-08-13.19:27:01
SpamBayes Score 3.12075e-08
Marked as misclassified No
Message-id <1281727623.94.0.00268299366408.issue9593@psf.upfronthosting.co.za>
In-reply-to
Content
The IO readlines() facility incorrectly processes utf8 files for some unknown reason. Specifically, the call generates too many entries in the lines array result after a character sequence "\x85 blah" which gets cut as ("\x85 ","blah") according the the resultant array. My workaround for this issue is not elegant, especially since I need the newline characters:

#BEGIN: WTF
a_str_whole = fs_in.read()
fs_in.close()
a_str_lines = a_str_whole.split("\n")
for idx in range(0,len(a_str_lines)-1):
   a_str_lines[idx]+="\n"
#END: WTF

Attached is an example script that defines the problem clearly.
History
Date User Action Args
2010-08-13 19:27:04jcopesetrecipients: + jcope
2010-08-13 19:27:03jcopesetmessageid: <1281727623.94.0.00268299366408.issue9593@psf.upfronthosting.co.za>
2010-08-13 19:27:02jcopelinkissue9593 messages
2010-08-13 19:27:02jcopecreate