Message135836
Tokenizing `' 1 2 3` versus `''' 1 2 3` yields different results.
Tokenizing `' 1 2 3` gives:
1,0-1,1: ERRORTOKEN "'"
1,2-1,3: NUMBER '1'
1,4-1,5: NUMBER '2'
1,6-1,7: NUMBER '3'
2,0-2,0: ENDMARKER ''
while tokenizing `''' 1 2 3` yields:
Traceback (most recent call last):
File "prog.py", line 4, in <module>
tokenize.tokenize(iter(["''' 1 2 3"]).next)
File "/usr/lib/python2.6/tokenize.py", line 169, in tokenize
tokenize_loop(readline, tokeneater)
File "/usr/lib/python2.6/tokenize.py", line 175, in tokenize_loop
for token_info in generate_tokens(readline):
File "/usr/lib/python2.6/tokenize.py", line 296, in generate_tokens
raise TokenError, ("EOF in multi-line string", strstart)
tokenize.TokenError: ('EOF in multi-line string', (1, 0))
Apparently tokenize decides to re-tokenize after the erroneous quote in the case of a single-quote, but not a triple-quote. I guess that this is because retokenizing the rest of the file after an unclosed triple-quote would be expensive; however, I've also been told it's very strange and possibly wrong for tokenize to be inconsistent this way.
If this is the right behavior, I guess I'd like it if it were documented. This sort of thing is confusing / potentially misleading for users of the tokenize module. Or at least, when I saw how single quotes were handled, I assumed incorrectly that all quotes were handled that way. |
|
Date |
User |
Action |
Args |
2011-05-12 14:19:30 | Devin Jeanpierre | set | recipients:
+ Devin Jeanpierre |
2011-05-12 14:19:30 | Devin Jeanpierre | set | messageid: <1305209970.17.0.31373709531.issue12063@psf.upfronthosting.co.za> |
2011-05-12 14:19:29 | Devin Jeanpierre | link | issue12063 messages |
2011-05-12 14:19:29 | Devin Jeanpierre | create | |
|