Message138893
You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.
One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored.
My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.
One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.
I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend. |
|
Date |
User |
Action |
Args |
2011-06-24 08:40:45 | Devin Jeanpierre | set | recipients:
+ Devin Jeanpierre, tim.peters, benjamin.peterson, r.david.murray, petri.lehtinen |
2011-06-24 08:40:45 | Devin Jeanpierre | set | messageid: <1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za> |
2011-06-24 08:40:44 | Devin Jeanpierre | link | issue11909 messages |
2011-06-24 08:40:44 | Devin Jeanpierre | create | |
|