This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Devin Jeanpierre
Recipients Devin Jeanpierre, benjamin.peterson, petri.lehtinen, r.david.murray, tim.peters
Date 2011-06-24.08:40:44
SpamBayes Score 2.4577007e-12
Marked as misclassified No
Message-id <1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za>
In-reply-to
Content
You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.

One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. 

My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.

One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.

I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend.
History
Date User Action Args
2011-06-24 08:40:45Devin Jeanpierresetrecipients: + Devin Jeanpierre, tim.peters, benjamin.peterson, r.david.murray, petri.lehtinen
2011-06-24 08:40:45Devin Jeanpierresetmessageid: <1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za>
2011-06-24 08:40:44Devin Jeanpierrelinkissue11909 messages
2011-06-24 08:40:44Devin Jeanpierrecreate