Message 113818 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jcope
Recipients	jcope
Date	2010-08-13.19:27:01
SpamBayes Score	3.1207506e-08
Marked as misclassified	No
Message-id	<1281727623.94.0.00268299366408.issue9593@psf.upfronthosting.co.za>
In-reply-to

Content
The IO readlines() facility incorrectly processes utf8 files for some unknown reason. Specifically, the call generates too many entries in the lines array result after a character sequence "\x85 blah" which gets cut as ("\x85 ","blah") according the the resultant array. My workaround for this issue is not elegant, especially since I need the newline characters: #BEGIN: WTF a_str_whole = fs_in.read() fs_in.close() a_str_lines = a_str_whole.split("\n") for idx in range(0,len(a_str_lines)-1): a_str_lines[idx]+="\n" #END: WTF Attached is an example script that defines the problem clearly.

The IO readlines() facility incorrectly processes utf8 files for some unknown reason. Specifically, the call generates too many entries in the lines array result after a character sequence "\x85 blah" which gets cut as ("\x85 ","blah") according the the resultant array. My workaround for this issue is not elegant, especially since I need the newline characters:

#BEGIN: WTF
a_str_whole = fs_in.read()
fs_in.close()
a_str_lines = a_str_whole.split("\n")
for idx in range(0,len(a_str_lines)-1):
   a_str_lines[idx]+="\n"
#END: WTF

Attached is an example script that defines the problem clearly.

History
Date	User	Action	Args
2010-08-13 19:27:04	jcope	set	recipients: + jcope
2010-08-13 19:27:03	jcope	set	messageid: <1281727623.94.0.00268299366408.issue9593@psf.upfronthosting.co.za>
2010-08-13 19:27:02	jcope	link	issue9593 messages
2010-08-13 19:27:02	jcope	create