Title: Return namedtuples from tokenize token generator
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: georg.brandl, mallyvai, rhettinger, superbobry
Priority: normal Keywords: needs review

Created on 2009-04-27 20:26 by mallyvai, last changed 2021-03-11 11:09 by superbobry. This issue is now closed.

File name Uploaded Description Edit
mallyvai_tokenize.patch mallyvai, 2009-04-27 20:26 Patch to decorate generate_tokens function in and return namedtuples.
tokenize.diff rhettinger, 2009-04-27 21:25 Alternate patch
Messages (7)
msg86691 - (view) Author: Vaibhav Mallya (mallyvai) Date: 2009-04-27 20:26
Returning an anonymous 5-tuple seems like a suboptimal interface since
it's so easy to accidentally confuse, for example, the indices of start
and end. I've used for several scripts in the past few weeks
and I've always ended up writing some sort of wrapper function for
generate_tokens that names the returned tuple's fields to help me avoid
mistakes like this.

I'd like to propose the following patch that simply decorates the
generate_token function and names its return values' fields. Since it's
a namedtuple, it should be fully backwards compatible with the existing
interface, but also allow member access via 

next_token.start.row, next_token.start.col
next_token.end.row, next_token.end.col

If this seems like a reasonable way to do things, I'd be happy to submit
relevant doc patches as well as the corresponding patch for 3.0.
msg86699 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-04-27 21:25
Comments on the earlier patch:
* No need for an inner row/col namedtuple.  That would add little value.
* The name "Token" is already used in the module for a different purpose.
* The wrapper is nice looking, but it is better to in-line this patch.

Attaching an alternate patch.
msg86701 - (view) Author: Vaibhav Mallya (mallyvai) Date: 2009-04-27 21:50
Well, the reason I put in the inner row/col namedtuple initially was
because the first mistake I made with the original module was mixing up
the row/col indices for a particular case. It certainly caused all sorts
of weird headaches. :o)

I mean, it seems like there's no real reason it "should" be (row,col)
instead of (col,row) in the returned tuple; that is, it feels like the
ordering is arbitrary in and of itself.

I really feel that allowing for start.row and start.col would make the
interface completely explicit and valid semantically.

Agreed with the other two points, however.

Also, I take it there's going to be a need for an addendum to the test
suite, since the interface is being modified?
msg86703 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-04-27 21:57
I strongly prefer that there not be inner named tuples.  That is going
overboard.  FWIW, row/col order is very common convention especially
when the row refers to a line number in a text block and column refers
to a character position within the row.
msg86773 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-04-28 23:20
Note that in tokenize.diff, "TokenInfo" should be in __all__ instead of
"Token".  I agree with Raymond on the inner tuples.
msg86774 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-04-29 00:35
Committed in r72086.
Needs backporting to 2.7.
msg388497 - (view) Author: Sergei Lebedev (superbobry) * Date: 2021-03-11 11:09
> I strongly prefer that there not be inner named tuples. 

Hi Raymond, do you still strongly prefer (row, col) to remain unnamed? If so, could you comment on what makes you prefer that apart from (row, col) being more common than (col, row)? 

Are there any performance implications/runtime costs associated with making it (row, col) a namedtuple?
Date User Action Args
2021-03-11 11:09:22superbobrysetnosy: + superbobry
messages: + msg388497
2009-04-29 00:35:54rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg86774
2009-04-28 23:20:25georg.brandlsetnosy: + georg.brandl
messages: + msg86773
2009-04-28 02:45:59rhettingersetassignee: rhettinger
2009-04-27 21:57:13rhettingersetmessages: + msg86703
2009-04-27 21:50:02mallyvaisetmessages: + msg86701
2009-04-27 21:25:05rhettingersetfiles: + tokenize.diff
versions: - Python 2.6, Python 3.0
nosy: + rhettinger

messages: + msg86699

keywords: + needs review, - patch
2009-04-27 20:26:55mallyvaicreate