Generate all tokens related code and docs from Grammar/Tokens #74640

serhiy-storchaka · 2017-05-24T12:21:50Z

BPO	30455
Nosy	@vstinner, @benjaminp, @bitdancer, @meadori, @serhiy-storchaka, @matrixise, @albertjan
PRs	bpo-30455: Generate tokens related C code and docs from token.py. #1860 Add additional generated files to .gitattributes #9343 bpo-30455: Generate all token related code and docs from Grammar/Tokens. #10370 bpo-35224: PEP 572 Implementation #10497

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2018-12-22.09:26:57.225>
created_at = <Date 2017-05-24.12:21:49.802>
labels = ['interpreter-core', '3.8', 'type-feature', 'library']
title = 'Generate all tokens related code and docs from Grammar/Tokens'
updated_at = <Date 2018-12-24.14:48:06.911>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2018-12-24.14:48:06.911>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2018-12-22.09:26:57.225>
closer = 'serhiy.storchaka'
components = ['Interpreter Core', 'Library (Lib)']
creation = <Date 2017-05-24.12:21:49.802>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 30455
keywords = ['patch']
message_count = 12.0
messages = ['294350', '294356', '294361', '294363', '294753', '294754', '294833', '329375', '330053', '332195', '332205', '332459']
nosy_count = 7.0
nosy_names = ['vstinner', 'benjamin.peterson', 'r.david.murray', 'meador.inge', 'serhiy.storchaka', 'matrixise', 'Albert-Jan Nijburg']
pr_nums = ['1860', '9343', '10370', '10497']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue30455'
versions = ['Python 3.8']

serhiy-storchaka · 2017-05-24T12:21:50Z

Currently Lib/token.py is generated from Include/token.h. This contradicts common practice when the C code is generated from the Python code (see for example opcode.py and sre_constants.py). In additional the table in Parser/tokenizer.c should be manually supported matching Include/token.h.

Generating Include/token.h and Parser/tokenizer.c from Lib/token.py would be simpler and more reliable.

vstinner · 2017-05-24T14:29:18Z

I like the idea.

matrixise · 2017-05-24T15:08:17Z

I can work on it

serhiy-storchaka · 2017-05-24T15:20:37Z

I already write a patch.

serhiy-storchaka · 2017-05-30T13:02:08Z

PR 1860 makes following files be generated from token.py:

Include/token.h
Parser/token.c. New file containing the array of token names _PyParser_TokenNames, and functions PyToken_OneChar(), PyToken_TwoChars(), PyToken_ThreeChars(), moved from Parser/tokenizer.c.
Doc/library/token-list.inc. New file containing the list of token.py constants, it is included in Doc/library/token.rst.

New Makefile target regen-token regenerates these files.

The dict EXACT_TOKEN_TYPES that maps operator strings to token names now is automatically generated and moved from tokenize.py to token.py. Tokens COMMENT, NL and ENCODING used only in tokenize.py now are added in token.py as in bpo-25324.

albertjan · 2017-05-30T13:14:54Z

I think this covers all the changes from PR bpo-1608. Looks a lot nicer too, building it every time from the make file.

You may want to add to the docs that token.py is now the source of the tokens.

serhiy-storchaka · 2017-05-31T11:10:44Z

The regular expression tokenize.Funny also can be generated. Information is not enough for distinguish between Operator, Bracket and Special, but seems this isn't needed.

Some token names can be generated from Grammar/Grammar. But needed an additional mapping for relations between token strings and names ('+' <-> PLUS, etc).

serhiy-storchaka · 2018-11-06T19:18:30Z

Alternate PR 10370 generates all files from a single file Grammar/Tokens using a single script Tools/scripts/generate_token.py.

In addition, the script doesn't write files when the content is not changed. Thus it can be used with read-only sources.

serhiy-storchaka · 2018-11-18T17:25:34Z

Could anybody please make a review? There are two alternate PRs: PR 1860 and PR 10370. The difference between them is that the former one uses Lib/token.py as a source, and the latter one uses Grammar/Tokens as a source and generates Lib/token.py too.

serhiy-storchaka · 2018-12-20T07:54:31Z

If there are no objections I am going to merge PR 10370 in few days.

vstinner · 2018-12-20T09:57:27Z

If there are no objections I am going to merge PR 10370 in few days.

LGTM. I guess that PR 9343 should be closed once PR 10370 is merged.

serhiy-storchaka · 2018-12-24T14:48:07Z

New changeset 8ac6581 by Serhiy Storchaka in branch 'master':
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
8ac6581

serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels May 24, 2017

serhiy-storchaka self-assigned this May 24, 2017

serhiy-storchaka added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Nov 6, 2018

serhiy-storchaka changed the title ~~Generate C code from token.py and not vice versa~~ Generate all tokens related code and docs from Grammar/Tokens Nov 6, 2018

serhiy-storchaka closed this as completed Dec 22, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate all tokens related code and docs from Grammar/Tokens #74640

Generate all tokens related code and docs from Grammar/Tokens #74640

serhiy-storchaka commented May 24, 2017

serhiy-storchaka commented May 24, 2017

vstinner commented May 24, 2017

matrixise commented May 24, 2017

serhiy-storchaka commented May 24, 2017

serhiy-storchaka commented May 30, 2017

albertjan mannequin commented May 30, 2017

serhiy-storchaka commented May 31, 2017

serhiy-storchaka commented Nov 6, 2018

serhiy-storchaka commented Nov 18, 2018

serhiy-storchaka commented Dec 20, 2018

vstinner commented Dec 20, 2018

serhiy-storchaka commented Dec 24, 2018

Generate all tokens related code and docs from Grammar/Tokens #74640

Generate all tokens related code and docs from Grammar/Tokens #74640

Comments

serhiy-storchaka commented May 24, 2017

serhiy-storchaka commented May 24, 2017

vstinner commented May 24, 2017

matrixise commented May 24, 2017

serhiy-storchaka commented May 24, 2017

serhiy-storchaka commented May 30, 2017

albertjan mannequin commented May 30, 2017

serhiy-storchaka commented May 31, 2017

serhiy-storchaka commented Nov 6, 2018

serhiy-storchaka commented Nov 18, 2018

serhiy-storchaka commented Dec 20, 2018

vstinner commented Dec 20, 2018

serhiy-storchaka commented Dec 24, 2018