Message 138893 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Devin Jeanpierre
Recipients	Devin Jeanpierre, benjamin.peterson, petri.lehtinen, r.david.murray, tim.peters
Date	2011-06-24.08:40:44
SpamBayes Score	2.4577007e-12
Marked as misclassified	No
Message-id	<1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za>
In-reply-to

Content
You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break. One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too. One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work. I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend.

You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.

One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. 

My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.

One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.

I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend.

History
Date	User	Action	Args
2011-06-24 08:40:45	Devin Jeanpierre	set	recipients: + Devin Jeanpierre, tim.peters, benjamin.peterson, r.david.murray, petri.lehtinen
2011-06-24 08:40:45	Devin Jeanpierre	set	messageid: <1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za>
2011-06-24 08:40:44	Devin Jeanpierre	link	issue11909 messages
2011-06-24 08:40:44	Devin Jeanpierre	create