Message181349
Line 402 in lib/python3.3/tokenize.py, contains the following line:
if first.startswith(BOM_UTF8):
BOM_UTF8 is a bytes object. str.startswith does not accept bytes objects. I was able to use tokenize.tokenize only after making the following changes:
Change line 402 to the following:
if first.startswith(BOM_UTF8.decode()):
Add these two lines at line 374:
except AttributeError:
line_string = line
Change line 485 to the following:
try:
line = line.decode(encoding)
except AttributeError:
pass
I do not know if these changes are correct as I have not fully tested this module after these changes, but it started working for me. This is the meat of my invokation of tokenize.tokenize:
import tokenize
with open('example.py') as file: # opening a file encoded as UTF-8
for token in tokenize.tokenize(file.readline):
print(token)
I am not suggesting that these changes are correct, but I do believe that the current implementation is incorrect. I am also unsure as to what other versions of Python are affected by this. |
|
Date |
User |
Action |
Args |
2013-02-04 16:51:28 | Tyler.Crompton | set | recipients:
+ Tyler.Crompton |
2013-02-04 16:51:28 | Tyler.Crompton | set | messageid: <1359996688.85.0.198838954965.issue17125@psf.upfronthosting.co.za> |
2013-02-04 16:51:28 | Tyler.Crompton | link | issue17125 messages |
2013-02-04 16:51:28 | Tyler.Crompton | create | |
|