Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate all tokens related code and docs from Grammar/Tokens #74640

Closed
serhiy-storchaka opened this issue May 24, 2017 · 12 comments
Closed

Generate all tokens related code and docs from Grammar/Tokens #74640

serhiy-storchaka opened this issue May 24, 2017 · 12 comments
Assignees
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 30455
Nosy @vstinner, @benjaminp, @bitdancer, @meadori, @serhiy-storchaka, @matrixise, @albertjan
PRs
  • bpo-30455: Generate tokens related C code and docs from token.py. #1860
  • Add additional generated files to .gitattributes #9343
  • bpo-30455: Generate all token related code and docs from Grammar/Tokens. #10370
  • bpo-35224: PEP 572 Implementation #10497
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2018-12-22.09:26:57.225>
    created_at = <Date 2017-05-24.12:21:49.802>
    labels = ['interpreter-core', '3.8', 'type-feature', 'library']
    title = 'Generate all tokens related code and docs from Grammar/Tokens'
    updated_at = <Date 2018-12-24.14:48:06.911>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2018-12-24.14:48:06.911>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2018-12-22.09:26:57.225>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core', 'Library (Lib)']
    creation = <Date 2017-05-24.12:21:49.802>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30455
    keywords = ['patch']
    message_count = 12.0
    messages = ['294350', '294356', '294361', '294363', '294753', '294754', '294833', '329375', '330053', '332195', '332205', '332459']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'r.david.murray', 'meador.inge', 'serhiy.storchaka', 'matrixise', 'Albert-Jan Nijburg']
    pr_nums = ['1860', '9343', '10370', '10497']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30455'
    versions = ['Python 3.8']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently Lib/token.py is generated from Include/token.h. This contradicts common practice when the C code is generated from the Python code (see for example opcode.py and sre_constants.py). In additional the table in Parser/tokenizer.c should be manually supported matching Include/token.h.

    Generating Include/token.h and Parser/tokenizer.c from Lib/token.py would be simpler and more reliable.

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels May 24, 2017
    @vstinner
    Copy link
    Member

    I like the idea.

    @matrixise
    Copy link
    Member

    I can work on it

    @serhiy-storchaka
    Copy link
    Member Author

    I already write a patch.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 24, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    PR 1860 makes following files be generated from token.py:

    New Makefile target regen-token regenerates these files.

    The dict EXACT_TOKEN_TYPES that maps operator strings to token names now is automatically generated and moved from tokenize.py to token.py. Tokens COMMENT, NL and ENCODING used only in tokenize.py now are added in token.py as in bpo-25324.

    @albertjan
    Copy link
    Mannequin

    albertjan mannequin commented May 30, 2017

    I think this covers all the changes from PR bpo-1608. Looks a lot nicer too, building it every time from the make file.

    You may want to add to the docs that token.py is now the source of the tokens.

    @serhiy-storchaka
    Copy link
    Member Author

    The regular expression tokenize.Funny also can be generated. Information is not enough for distinguish between Operator, Bracket and Special, but seems this isn't needed.

    Some token names can be generated from Grammar/Grammar. But needed an additional mapping for relations between token strings and names ('+' <-> PLUS, etc).

    @serhiy-storchaka
    Copy link
    Member Author

    Alternate PR 10370 generates all files from a single file Grammar/Tokens using a single script Tools/scripts/generate_token.py.

    In addition, the script doesn't write files when the content is not changed. Thus it can be used with read-only sources.

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Nov 6, 2018
    @serhiy-storchaka serhiy-storchaka changed the title Generate C code from token.py and not vice versa Generate all tokens related code and docs from Grammar/Tokens Nov 6, 2018
    @serhiy-storchaka
    Copy link
    Member Author

    Could anybody please make a review? There are two alternate PRs: PR 1860 and PR 10370. The difference between them is that the former one uses Lib/token.py as a source, and the latter one uses Grammar/Tokens as a source and generates Lib/token.py too.

    @serhiy-storchaka
    Copy link
    Member Author

    If there are no objections I am going to merge PR 10370 in few days.

    @vstinner
    Copy link
    Member

    If there are no objections I am going to merge PR 10370 in few days.

    LGTM. I guess that PR 9343 should be closed once PR 10370 is merged.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 8ac6581 by Serhiy Storchaka in branch 'master':
    bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
    8ac6581

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants