Message 172850 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eric.snow
Recipients	eric.snow
Date	2012-10-14.05:49:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1350193751.15.0.784217855065.issue16223@psf.upfronthosting.co.za>
In-reply-to

Content
If you pass an iterable of tokens and none of them are an ENCODING token, tokenize.untokenize() returns a string. This is contrary to what the docs say: It returns bytes, encoded using the ENCODING token, which is the first token sequence output by tokenize(). Either the docs should be clarified or untokenize() fixed. My vote is to fix it. It could check that the first token is an ENCODING token and raise an exception. Alternately it could fall back to using 'utf-8' by default. [1] http://docs.python.org/py3k/library/tokenize.html#tokenize.untokenize

If you pass an iterable of tokens and none of them are an ENCODING token, tokenize.untokenize() returns a string.  This is contrary to what the docs say:

   It returns bytes, encoded using the ENCODING token, which is the
   first token sequence output by tokenize().

Either the docs should be clarified or untokenize() fixed.  My vote is to fix it.  It could check that the first token is an ENCODING token and raise an exception.  Alternately it could fall back to using 'utf-8' by default.

[1] http://docs.python.org/py3k/library/tokenize.html#tokenize.untokenize

History
Date	User	Action	Args
2012-10-14 05:49:11	eric.snow	set	recipients: + eric.snow
2012-10-14 05:49:11	eric.snow	set	messageid: <1350193751.15.0.784217855065.issue16223@psf.upfronthosting.co.za>
2012-10-14 05:49:11	eric.snow	link	issue16223 messages
2012-10-14 05:49:10	eric.snow	create