This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Generate all tokens related code and docs from Grammar/Tokens
Type: enhancement Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.8
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Albert-Jan Nijburg, benjamin.peterson, matrixise, meador.inge, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017-05-24 12:21 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1860 closed serhiy.storchaka, 2017-05-30 12:52
PR 9343 emilyemorehouse, 2018-09-17 14:46
PR 10370 merged serhiy.storchaka, 2018-11-06 19:14
PR 10497 emilyemorehouse, 2018-11-20 19:28
Messages (12)
msg294350 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-24 12:21
Currently Lib/ is generated from Include/token.h. This contradicts common practice when the C code is generated from the Python code (see for example and In additional the table in Parser/tokenizer.c should be manually supported matching Include/token.h.

Generating Include/token.h and Parser/tokenizer.c from Lib/ would be simpler and more reliable.
msg294356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-24 14:29
I like the idea.
msg294361 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-05-24 15:08
I can work on it
msg294363 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-24 15:20
I already write a patch.
msg294753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-30 13:02
PR 1860 makes following files be generated from

* Include/token.h
* Parser/token.c. New file containing the array of token names _PyParser_TokenNames, and functions PyToken_OneChar(), PyToken_TwoChars(), PyToken_ThreeChars(), moved from Parser/tokenizer.c.
* Doc/library/ New file containing the list of constants, it is included in Doc/library/token.rst.

New Makefile target regen-token regenerates these files.

The dict EXACT_TOKEN_TYPES that maps operator strings to token names now is automatically generated and moved from to Tokens COMMENT, NL and ENCODING used only in now are added in as in issue25324.
msg294754 - (view) Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * Date: 2017-05-30 13:14
I think this covers all the changes from PR #1608. Looks a lot nicer too, building it every time from the make file. 

You may want to add to the docs that is now the source of the tokens.
msg294833 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-31 11:10
The regular expression tokenize.Funny also can be generated. Information is not enough for distinguish between Operator, Bracket and Special, but seems this isn't needed.

Some token names can be generated from Grammar/Grammar. But needed an additional mapping for relations between token strings and names ('+' <-> PLUS, etc).
msg329375 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-06 19:18
Alternate PR 10370 generates all files from a single file Grammar/Tokens using a single script Tools/scripts/

In addition, the script doesn't write files when the content is not changed. Thus it can be used with read-only sources.
msg330053 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-18 17:25
Could anybody please make a review? There are two alternate PRs: PR 1860 and PR 10370. The difference between them is that the former one uses Lib/ as a source, and the latter one uses Grammar/Tokens as a source and generates Lib/ too.
msg332195 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-20 07:54
If there are no objections I am going to merge PR 10370 in few days.
msg332205 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-20 09:57
> If there are no objections I am going to merge PR 10370 in few days.

LGTM. I guess that PR 9343 should be closed once PR 10370 is merged.
msg332459 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-24 14:48
New changeset 8ac658114dec4964479baecfbc439fceb40eaa79 by Serhiy Storchaka in branch 'master':
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74640
2018-12-24 14:48:06serhiy.storchakasetmessages: + msg332459
2018-12-22 09:26:57serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-12-20 09:57:27vstinnersetmessages: + msg332205
2018-12-20 07:54:30serhiy.storchakasetmessages: + msg332195
2018-11-20 19:28:26emilyemorehousesetpull_requests: + pull_request9865
2018-11-18 17:25:34serhiy.storchakasetmessages: + msg330053
2018-11-06 19:27:01serhiy.storchakasettitle: Generate C code from and not vice versa -> Generate all tokens related code and docs from Grammar/Tokens
versions: + Python 3.8, - Python 3.7
2018-11-06 19:18:30serhiy.storchakasetmessages: + msg329375
2018-11-06 19:14:43serhiy.storchakasetpull_requests: + pull_request9671
2018-09-17 14:46:12emilyemorehousesetpull_requests: + pull_request8783
2018-02-14 10:32:23serhiy.storchakasetpull_requests: - pull_request5479
2018-02-14 10:31:47zach.waresetkeywords: + patch
pull_requests: + pull_request5479
2017-05-31 11:10:44serhiy.storchakasetmessages: + msg294833
2017-05-30 13:14:53Albert-Jan Nijburgsetmessages: + msg294754
2017-05-30 13:02:08serhiy.storchakasetmessages: + msg294753
stage: patch review
2017-05-30 12:52:07serhiy.storchakasetpull_requests: + pull_request1943
2017-05-24 15:20:37serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg294363
2017-05-24 15:08:17matrixisesetnosy: + matrixise
messages: + msg294361
2017-05-24 14:29:18vstinnersetmessages: + msg294356
2017-05-24 12:21:49serhiy.storchakacreate