classification
Title: Ellipsis_token.type != token.ELLIPSIS
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Aivar.Annamaa, Mariatta, berker.peksag, docs@python, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2017-09-08 09:24 by Aivar.Annamaa, last changed 2017-09-13 04:04 by Mariatta. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3469 merged python-dev, 2017-09-09 09:18
PR 3525 merged Mariatta, 2017-09-13 03:29
PR 3526 merged Mariatta, 2017-09-13 03:51
Messages (14)
msg301687 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 09:24
Type code for ellipsis token doesn't match with the constant token.ELLIPSIS:
---------------------------------------
import io
import token
import tokenize

source = "..."

tokens = list(tokenize.tokenize(io.BytesIO(source.encode('utf-8')).readline))
ellipsis = tokens[1]

print(ellipsis)
print(token.ELLIPSIS) 
-----------------------------------------

This code outputs following in Python 3.5 and 3.6:

> TokenInfo(type=53 (OP), string='...', start=(1, 0), end=(1, 3), line='...')
> 52

and following in Python 3.4

> TokenInfo(type=52 (OP), string='...', start=(1, 0), end=(1, 3), line='...')
> 51
msg301688 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 09:40
Thank you for the report, but this behavior is already documented at https://docs.python.org/3/library/tokenize.html

    To simplify token stream handling, all Operators and Delimiters
    tokens are returned using the generic token.OP token type. The
    exact type can be determined by checking the exact_type property
    on the named tuple returned from tokenize.tokenize().

If you replace the following line

    print(ellipsis)

with

    print(ellipsis.exact_type)

you will see that it prints '52'.
msg301689 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 09:57
But it looks like Ellipsis is neither operator nor delimiter: https://docs.python.org/3/reference/lexical_analysis.html#operators
msg301691 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 10:03
If you look at https://docs.python.org/3/reference/lexical_analysis.html#delimiters '.' (period) is listed as delimiter and there is the following sentence to answer your question:

    A sequence of three periods has a special meaning as an ellipsis literal.
msg301694 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 11:37
But ellipsis is a distinct token, not a sequence of three period tokens.

Also, I can't see how we could conceptually treat ellipsis as a delimiter or operator -- it's a literal.

I still think either documentation or implementation needs to be fixed here.
msg301697 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 13:31
Please don't reopen an issue if it was closed by a core developer.

It's not clear to me what exactly do you want to change in the implementation or documentation.

     A sequence of three periods has a special meaning as an ellipsis literal.

is literally describes how the ELLIPSIS token is identified in Parser/tokenizer.c (see PyToken_ThreeChars in that file) So a sequence of three periods is identified as an ellipsis literal which is an expression in Python.

Do you want to change tokenize.tokenize() (it's in Lib/tokenize.py) so it will return

    TokenInfo(type=52 (ELLIPSIS), ...)

instead of

    TokenInfo(type=53 (OP), ...)

? Note that ELLIPSIS has been added to tokenize.EXACT_TOKEN_TYPES in issue 24622. To me, since it has been added to tokenize.EXACT_TOKEN_TYPES there is no need to special case ELLIPSIS in Lib/tokenize.py.

Or do you want to clarify

    To simplify token stream handling, all Operators and Delimiters tokens are returned using the generic token.OP token type.

at https://docs.python.org/3/library/tokenize.html?
msg301707 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-08 17:36
(Sorry, I didn't mean to challenge the authority of a core developer. I simply didn't notice that adding a comment reopens the issue. I hope this time I selected correct parameters and this doesn't happen again)

I'm trying to rephrase my concern. 

Initially I thought there was a mistake in the tokenizer or in the token module.

After you pointed out the documentation about token.OP and exact_type, I'm worried about a smaller detail. The documentation only talks about *operators* and *delimiters* having type attribute set to token.OP. According to my understanding (and also the listings at https://docs.python.org/3/reference/lexical_analysis.html#operators), ellipsis is neither operator nor delimiter. 

Am I right about this? 

(I understand that the source representation of both ELLIPSIS and DOT tokens contains period *character(s)*, but I don't see why is this relevant when we discuss the properties of *tokens*)

Anyway, if ellipsis is neither operator nor delimiter, (and if there is a reason why it is treated similarly with operators and delimiters) then I recommend to update the documentation by replacing 

> all Operators and Delimiters tokens 
> are returned using the generic token.OP token type

with 

> all Operators, Delimiters and Ellipsis tokens
> are returned using the generic token.OP token type


(I understand, that this is not a serious issue. If you prefer not to discuss it further then I'm happy to leave it as it is.)
msg301713 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-09-08 18:22
I think we should add 'and Ellipsis' in the tokenize doc to get
'all Operators and Delimiters tokens and Ellipsis are returned ...'.  I would actually prefer 'Operator and Delimiter tokens' but I don't know if the 's' is needed to trigger the current linkage.
msg301725 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-09-08 19:44
Thank you for your detailed response, Aivar. I agree that adding 'and Ellipsis' would make the tokenize documentation clearer. Would you like to send a pull request?
msg301765 - (view) Author: Aivar Annamaa (Aivar.Annamaa) * Date: 2017-09-09 09:33
Here is the PR: https://github.com/python/cpython/pull/3469

(It's my first, so I don't know if I should to also update NEWS file or add "skip news" label. I signed the CLA, so I hope this warning goes away.)
msg301777 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-09-09 16:08
Once the CLA * shows up after your name, one of us should remove CLA needed tag and bot will verify.  I believe tagging as trivial would also work.
msg302028 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 03:24
New changeset 5f8fbf917ebf2398aa75a1f271617e2e50ab7c88 by Mariatta (Aivar Annamaa) in branch 'master':
bpo-31394: Clarify documentation about token type attribute (GH-3469)
https://github.com/python/cpython/commit/5f8fbf917ebf2398aa75a1f271617e2e50ab7c88
msg302029 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 03:43
New changeset 5513e888e9a742156c35ce7ab628407d8cf9e1f0 by Mariatta in branch '3.6':
[3.6] bpo-31394: Clarify documentation about token type attribute (GH-3469) (GH-3525)
https://github.com/python/cpython/commit/5513e888e9a742156c35ce7ab628407d8cf9e1f0
msg302030 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-09-13 04:00
New changeset ea0f7c26cef4550bf4db1a9bae17d41b79ab7c0d by Mariatta in branch 'master':
bpo-31394: Make tokenize.rst PEP 8-compliant (GH-3526)
https://github.com/python/cpython/commit/ea0f7c26cef4550bf4db1a9bae17d41b79ab7c0d
History
Date User Action Args
2017-09-13 04:04:26Mariattasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-09-13 04:00:02Mariattasetmessages: + msg302030
2017-09-13 03:51:28Mariattasetpull_requests: + pull_request3526
2017-09-13 03:43:06Mariattasetmessages: + msg302029
2017-09-13 03:29:21Mariattasetpull_requests: + pull_request3525
2017-09-13 03:24:06Mariattasetnosy: + Mariatta
messages: + msg302028
2017-09-09 16:08:41terry.reedysetmessages: + msg301777
2017-09-09 09:33:05Aivar.Annamaasetmessages: + msg301765
2017-09-09 09:18:50python-devsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request3461
2017-09-08 19:44:13berker.peksagsetkeywords: + easy
type: enhancement
messages: + msg301725
2017-09-08 18:22:41terry.reedysetassignee: docs@python
components: + Documentation, - Interpreter Core
versions: + Python 3.7, - Python 3.4, Python 3.5
nosy: + terry.reedy, docs@python

messages: + msg301713
stage: resolved -> needs patch
2017-09-08 17:36:16Aivar.Annamaasetmessages: + msg301707
2017-09-08 13:31:17berker.peksagsetmessages: + msg301697
2017-09-08 11:37:35Aivar.Annamaasetstatus: closed -> open
title: Ellipsis token.type != token.ELLIPSIS -> Ellipsis_token.type != token.ELLIPSIS
type: behavior -> (no value)
messages: + msg301694

resolution: not a bug -> (no value)
2017-09-08 10:03:42berker.peksagsetmessages: + msg301691
2017-09-08 09:57:20Aivar.Annamaasetmessages: + msg301689
2017-09-08 09:40:34berker.peksagsetstatus: open -> closed

type: behavior

nosy: + berker.peksag
messages: + msg301688
resolution: not a bug
stage: resolved
2017-09-08 09:24:31Aivar.Annamaacreate