This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jdwhitley
Recipients georg.brandl, jaywalker, jdwhitley, pitrou, sjmachin, vstinner
Date 2009-03-09.05:11:29
SpamBayes Score 3.1550584e-08
Marked as misclassified No
Message-id <1236575492.47.0.0523080771295.issue4847@psf.upfronthosting.co.za>
In-reply-to
Content
Hi all,

This patch takes the approach of assuming utf-8 format encoding
for files opened with 'rb' directive. 

That is:

1. Check if each line is Unicode Or Bytes Type.
2. If Bytes, get char array reference to internal buffer.
3. use PyUnicode_FromString to create a new unicode object from the
char* - This step assumes UTF-8.
4. get a Py_UNICODE reference to internal unicode object buffer and 
   continue as before.

Is this in the right direction at all?

Cheers,

Jervis
History
Date User Action Args
2009-03-09 05:11:33jdwhitleysetrecipients: + jdwhitley, georg.brandl, sjmachin, pitrou, vstinner, jaywalker
2009-03-09 05:11:32jdwhitleysetmessageid: <1236575492.47.0.0523080771295.issue4847@psf.upfronthosting.co.za>
2009-03-09 05:11:31jdwhitleylinkissue4847 messages
2009-03-09 05:11:30jdwhitleycreate