Author nyamatongwe
Recipients nyamatongwe
Date 2009-08-07.09:14:12
SpamBayes Score 0.000968385
Marked as misclassified No
Message-id <1249636455.06.0.336549280075.issue6664@psf.upfronthosting.co.za>
In-reply-to
Content
Unicode includes Line Separator U+2028 and Paragraph Separator U+2029
line ending characters. The readlines method of the file object returned
by the built-in open does not treat these characters as line ends
although the object returned by codecs.open(..., encoding='utf-8') does.

The attached program creates a UTF-8 file containing three lines with
the second line ended with a Paragraph Separator. The program then reads
this file back in as a text file. Only two lines are seen when reading
the file back in.

The desired behaviour is for the file to be read in as three lines.
History
Date User Action Args
2009-08-07 09:14:15nyamatongwesetrecipients: + nyamatongwe
2009-08-07 09:14:15nyamatongwesetmessageid: <1249636455.06.0.336549280075.issue6664@psf.upfronthosting.co.za>
2009-08-07 09:14:13nyamatongwelinkissue6664 messages
2009-08-07 09:14:12nyamatongwecreate