Title: Ellipsis_token.type != token.ELLIPSIS
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Aivar.Annamaa, Mariatta, berker.peksag, docs@python, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2017-09-08 09:24 by Aivar.Annamaa, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3469 merged python-dev, 2017-09-09 09:18
PR 3525 merged Mariatta, 2017-09-13 03:29
PR 3526 merged Mariatta, 2017-09-13 03:51
Messages (14)
msg301687 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 09:24
Type code for ellipsis token doesn't match with the constant token.ELLIPSIS:
import io
import token
import tokenize

source = "..."

tokens = list(tokenize.tokenize(io.BytesIO(source.encode('utf-8')).readline))
ellipsis = tokens[1]


This code outputs following in Python 3.5 and 3.6:

> TokenInfo(type=53 (OP), string='...', start=(1, 0), end=(1, 3), line='...')
> 52

and following in Python 3.4

> TokenInfo(type=52 (OP), string='...', start=(1, 0), end=(1, 3), line='...')
> 51
msg301688 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 09:40
Thank you for the report, but this behavior is already documented at

    To simplify token stream handling, all Operators and Delimiters
    tokens are returned using the generic token.OP token type. The
    exact type can be determined by checking the exact_type property
    on the named tuple returned from tokenize.tokenize().

If you replace the following line




you will see that it prints '52'.
msg301689 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 09:57
But it looks like Ellipsis is neither operator nor delimiter:
msg301691 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 10:03
If you look at '.' (period) is listed as delimiter and there is the following sentence to answer your question:

    A sequence of three periods has a special meaning as an ellipsis literal.
msg301694 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 11:37
But ellipsis is a distinct token, not a sequence of three period tokens.

Also, I can't see how we could conceptually treat ellipsis as a delimiter or operator -- it's a literal.

I still think either documentation or implementation needs to be fixed here.
msg301697 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 13:31
Please don't reopen an issue if it was closed by a core developer.

It's not clear to me what exactly do you want to change in the implementation or documentation.

     A sequence of three periods has a special meaning as an ellipsis literal.

is literally describes how the ELLIPSIS token is identified in Parser/tokenizer.c (see PyToken_ThreeChars in that file) So a sequence of three periods is identified as an ellipsis literal which is an expression in Python.

Do you want to change tokenize.tokenize() (it's in Lib/ so it will return

    TokenInfo(type=52 (ELLIPSIS), ...)

instead of

    TokenInfo(type=53 (OP), ...)

? Note that ELLIPSIS has been added to tokenize.EXACT_TOKEN_TYPES in issue 24622. To me, since it has been added to tokenize.EXACT_TOKEN_TYPES there is no need to special case ELLIPSIS in Lib/

Or do you want to clarify

    To simplify token stream handling, all Operators and Delimiters tokens are returned using the generic token.OP token type.

msg301707 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 17:36
(Sorry, I didn't mean to challenge the authority of a core developer. I simply didn't notice that adding a comment reopens the issue. I hope this time I selected correct parameters and this doesn't happen again)

I'm trying to rephrase my concern. 

Initially I thought there was a mistake in the tokenizer or in the token module.

After you pointed out the documentation about token.OP and exact_type, I'm worried about a smaller detail. The documentation only talks about *operators* and *delimiters* having type attribute set to token.OP. According to my understanding (and also the listings at, ellipsis is neither operator nor delimiter. 

Am I right about this? 

(I understand that the source representation of both ELLIPSIS and DOT tokens contains period *character(s)*, but I don't see why is this relevant when we discuss the properties of *tokens*)

Anyway, if ellipsis is neither operator nor delimiter, (and if there is a reason why it is treated similarly with operators and delimiters) then I recommend to update the documentation by replacing 

> all Operators and Delimiters tokens 
> are returned using the generic token.OP token type


> all Operators, Delimiters and Ellipsis tokens
> are returned using the generic token.OP token type

(I understand, that this is not a serious issue. If you prefer not to discuss it further then I'm happy to leave it as it is.)
msg301713 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-09-08 18:22
I think we should add 'and Ellipsis' in the tokenize doc to get
'all Operators and Delimiters tokens and Ellipsis are returned ...'.  I would actually prefer 'Operator and Delimiter tokens' but I don't know if the 's' is needed to trigger the current linkage.
msg301725 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 19:44
Thank you for your detailed response, Aivar. I agree that adding 'and Ellipsis' would make the tokenize documentation clearer. Would you like to send a pull request?
msg301765 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-09 09:33
Here is the PR:

(It's my first, so I don't know if I should to also update NEWS file or add "skip news" label. I signed the CLA, so I hope this warning goes away.)
msg301777 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-09-09 16:08
Once the CLA * shows up after your name, one of us should remove CLA needed tag and bot will verify.  I believe tagging as trivial would also work.
msg302028 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 03:24
New changeset 5f8fbf917ebf2398aa75a1f271617e2e50ab7c88 by Mariatta (Aivar Annamaa) in branch 'master':
bpo-31394: Clarify documentation about token type attribute (GH-3469)
msg302029 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 03:43
New changeset 5513e888e9a742156c35ce7ab628407d8cf9e1f0 by Mariatta in branch '3.6':
[3.6] bpo-31394: Clarify documentation about token type attribute (GH-3469) (GH-3525)
msg302030 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 04:00
New changeset ea0f7c26cef4550bf4db1a9bae17d41b79ab7c0d by Mariatta in branch 'master':
bpo-31394: Make tokenize.rst PEP 8-compliant (GH-3526)
