Message 181349 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Tyler.Crompton
Recipients	Tyler.Crompton
Date	2013-02-04.16:51:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1359996688.85.0.198838954965.issue17125@psf.upfronthosting.co.za>
In-reply-to

Content
Line 402 in lib/python3.3/tokenize.py, contains the following line: if first.startswith(BOM_UTF8): BOM_UTF8 is a bytes object. str.startswith does not accept bytes objects. I was able to use tokenize.tokenize only after making the following changes: Change line 402 to the following: if first.startswith(BOM_UTF8.decode()): Add these two lines at line 374: except AttributeError: line_string = line Change line 485 to the following: try: line = line.decode(encoding) except AttributeError: pass I do not know if these changes are correct as I have not fully tested this module after these changes, but it started working for me. This is the meat of my invokation of tokenize.tokenize: import tokenize with open('example.py') as file: # opening a file encoded as UTF-8 for token in tokenize.tokenize(file.readline): print(token) I am not suggesting that these changes are correct, but I do believe that the current implementation is incorrect. I am also unsure as to what other versions of Python are affected by this.

Line 402 in lib/python3.3/tokenize.py, contains the following line:

    if first.startswith(BOM_UTF8):

BOM_UTF8 is a bytes object. str.startswith does not accept bytes objects. I was able to use tokenize.tokenize only after making the following changes:

Change line 402 to the following:

    if first.startswith(BOM_UTF8.decode()):

Add these two lines at line 374:

        except AttributeError:
            line_string = line

Change line 485 to the following:

            try:
                line = line.decode(encoding)
            except AttributeError:
                pass

I do not know if these changes are correct as I have not fully tested this module after these changes, but it started working for me. This is the meat of my invokation of tokenize.tokenize:

import tokenize

with open('example.py') as file: # opening a file encoded as UTF-8
	for token in tokenize.tokenize(file.readline):
		print(token)

I am not suggesting that these changes are correct, but I do believe that the current implementation is incorrect. I am also unsure as to what other versions of Python are affected by this.

History
Date	User	Action	Args
2013-02-04 16:51:28	Tyler.Crompton	set	recipients: + Tyler.Crompton
2013-02-04 16:51:28	Tyler.Crompton	set	messageid: <1359996688.85.0.198838954965.issue17125@psf.upfronthosting.co.za>
2013-02-04 16:51:28	Tyler.Crompton	link	issue17125 messages
2013-02-04 16:51:28	Tyler.Crompton	create