Author duncf
Recipients duncf
Date 2009-01-22.01:38:55
SpamBayes Score 0.00513486
Marked as misclassified No
Message-id <1232588343.15.0.723713394492.issue5028@psf.upfronthosting.co.za>
In-reply-to
Content
According to the documentation for tokenize.generate_tokens:

"The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed (the last
tuple item) is the logical line; continuation lines are included."

It seems though that the "logical line" -- the last element of the tuple
is the physical line unless the token being returned spans beyond the
end of the line. As an example, consider a test file test.py:

foo = """
%s """ % 'bar'

>>> import pprint, tokenize
>>> pprint.pprint(list(tokenize.generate_tokens(open('test.py').readline)))
[(1, 'foo', (1, 0), (1, 3), 'foo = """\n'),
 (51, '=', (1, 4), (1, 5), 'foo = """\n'),
 (3, '"""\n%s """', (1, 6), (2, 6), 'foo = """\n%s """ % \'bar\'\n'),
 (51, '%', (2, 7), (2, 8), '%s """ % \'bar\'\n'),
 (3, "'bar'", (2, 9), (2, 14), '%s """ % \'bar\'\n'),
 (4, '\n', (2, 14), (2, 15), '%s """ % \'bar\'\n'),
 (0, '', (3, 0), (3, 0), '')]
>>> 

Since there is only one logical line, I would expect the first 6 tokens
to have the same 5th element.
History
Date User Action Args
2009-01-22 01:39:03duncfsetrecipients: + duncf
2009-01-22 01:39:03duncfsetmessageid: <1232588343.15.0.723713394492.issue5028@psf.upfronthosting.co.za>
2009-01-22 01:39:01duncflinkissue5028 messages
2009-01-22 01:38:59duncfcreate