Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 617: new PEG-based parser #84514

Closed
gvanrossum opened this issue Apr 19, 2020 · 110 comments
Closed

PEP 617: new PEG-based parser #84514

gvanrossum opened this issue Apr 19, 2020 · 110 comments
Labels
3.9 only security fixes type-feature A feature request or enhancement

Comments

@gvanrossum
Copy link
Member

BPO 40334
Nosy @gvanrossum, @rhettinger, @ned-deily, @serhiy-storchaka, @Carreau, @asottile, @lysnikolaou, @pablogsal, @miss-islington, @isidentical, @hauntsaninja
PRs
  • bpo-40334: PEP 617: New PEG parser for CPython #19503
  • bpo-40334: Only run the CI with the new parser #19664
  • bpo-40334: Fix errors in parse_string.c with old compilers #19666
  • bpo-40334: Fix builds outside the source directory and regenerate autoconf files #19667
  • bpo-40334: Improve various PEG-Parser related stuff #19669
  • bpo-40334: Rename PyConfig.use_peg to _use_peg_parser #19670
  • bpo-40334: Compile extensions in test_peg_generator with C99 #19668
  • bpo-40334: Don't downcast from Py_ssize_t to int #19671
  • bpo-40334: Fix build errors and warnings in test_peg_generator #19672
  • bpo-40334: Suppress all output in test_peg_generator #19675
  • bpo-40334: Allow to run make regen-pegen without distutils #19684
  • bpo-40334: Use old compiler when compile mode is func_type #19692
  • bpo-40334: Rewrite test_c_parser to avoid memory leaks #19694
  • bpo-40334: Fix broken mkdir -p call in regen-pegen #19695
  • bpo-40334: Add What's New sections for PEP 617 and PEP 585 #19704
  • bpo-40334: Support CO_FUTURE_BARRY_AS_BDFL in the new parser #19721
  • bpo-40334: Add tests/test_peg_generator to the install files target #19723
  • bpo-40334: Support PyPARSE_DONT_IMPLY_DEDENT in the new parser #19736
  • bpo-40334: Catch E_EOF error when the tokenizer returns ERRORTOKEN #19743
  • bpo-40334: Don't skip test_parser:test_trigger_memory_error #19744
  • bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745
  • bpo-40334: Fix shifting of nested fstrings #19771
  • bpo-40334: Disallow invalid single statements in the new parser #19774
  • bpo-40334: refactor and cleanup for the PEG generators #19775
  • bpo-40334: Fix test_peg_parser to actually use the old parser #19778
  • bpo-40334: Explicitly cast to int in pegen.c to fix a compiler warning #19779
  • bpo-40334: Support type comments #19780
  • bpo-40334: Improve column offsets for thrown syntax errors by Pegen #19782
  • bpo-40334: Support Python 3.6 in the PEG generator #19786
  • bpo-40334: Simplify type handling in the PEG c_generator #19818
  • bpo-40334: Add support for feature_version in new PEG parser #19827
  • bpo-40334: Ensure that tok->type_comments is set on every path #19828
  • bpo-40334: Refactor lambda_parameters similar to parameters #19830
  • bpo-40334: Correct return value of func_type_comment #19833
  • bpo-40334: unskip test_function_type in test_unparse with the new parser #19837
  • bpo-40334: Make the PyPegen* and PyParser* APIs more consistent #19839
  • bpo-40334: use the TOKENS file when checking dangling rules #19849
  • bpo-40334: regenerate metaparser as part of regen-all #19854
  • bpo-40334: Spacialized error message for invalid args after bare '*' #19865
  • bpo-40334: Set error_indicator in _PyPegen_raise_error #19887
  • bpo-40334: produce specialised errors for del #19911
  • bpo-40334: Fix error location upon parsing an invalid string literal #19962
  • bpo-40334: Add type to the assignment rule in the grammar file #19963
  • bpo-40334: Allow trailing comma in parenthesized context managers #19964
  • bpo-40334: Generate comments in the parser code to improve debugging #19966
  • bpo-40939: Add the new grammar to the grammar specification documentation #19969
  • bpo-40334: Error message for invalid default args in function call #19973
  • bpo-40334: Avoid collisions between parser variables and grammar variables #19987
  • bpo-40334: Improvements to error-handling code in the PEG parser #20003
  • bpo-40334: Do not show error caret if RAISE_SYNTAX_ERROR_NO_COL_OFFSE… #20020
  • bpo-40334: Always show the caret on SyntaxErrors #20050
  • bpo-40334: Correctly identify invalid target in assignment errors #20076
  • bpo-40334: Produce better error messages on invalid targets #20106
  • bpo-40334: Reproduce error message for type comments on bare '*' #20151
  • bpo-40334: Produce better error messages for non-parenthesized genexps #20153
  • bpo-40334: Correctly generate C parser when assigned var is None #20296
  • [3.9] bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153) #20307
  • bpo-40334: Support suppressing of multiple optional variables in Pegen #20367
  • [3.9] bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367) #20368
  • [3.9] bpo-40334: Produce better error messages on invalid targets (GH-20106) #20973
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-06-19.07:54:58.363>
    created_at = <Date 2020-04-19.20:16:17.317>
    labels = ['type-feature', '3.9']
    title = 'PEP 617: new PEG-based parser'
    updated_at = <Date 2020-06-19.07:54:58.362>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2020-06-19.07:54:58.362>
    actor = 'lys.nikolaou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-06-19.07:54:58.363>
    closer = 'lys.nikolaou'
    components = []
    creation = <Date 2020-04-19.20:16:17.317>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40334
    keywords = ['patch']
    message_count = 110.0
    messages = ['366803', '367009', '367050', '367057', '367059', '367060', '367061', '367063', '367064', '367066', '367067', '367068', '367069', '367071', '367073', '367075', '367076', '367084', '367097', '367105', '367112', '367113', '367114', '367115', '367121', '367126', '367135', '367136', '367137', '367139', '367140', '367141', '367142', '367155', '367158', '367160', '367170', '367171', '367190', '367191', '367196', '367237', '367243', '367246', '367439', '367440', '367448', '367467', '367470', '367471', '367473', '367474', '367521', '367564', '367576', '367577', '367580', '367582', '367598', '367602', '367610', '367637', '367713', '367772', '367815', '367822', '367835', '367836', '367839', '367840', '367843', '367845', '367850', '367854', '367862', '367866', '367891', '367914', '367916', '367974', '367976', '367978', '368004', '368034', '368288', '368302', '368304', '368329', '368330', '368566', '368567', '368593', '368595', '368598', '368599', '368663', '368796', '368892', '369090', '369166', '369203', '369288', '369536', '369553', '369554', '369837', '369840', '371841', '371842', '371848']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'rhettinger', 'ned.deily', 'serhiy.storchaka', 'mbussonn', 'Anthony Sottile', 'lys.nikolaou', 'pablogsal', 'miss-islington', 'BTaskaya', 'hauntsaninja']
    pr_nums = ['19503', '19664', '19666', '19667', '19669', '19670', '19668', '19671', '19672', '19675', '19684', '19692', '19694', '19695', '19704', '19721', '19723', '19736', '19743', '19744', '19745', '19771', '19774', '19775', '19778', '19779', '19780', '19782', '19786', '19818', '19827', '19828', '19830', '19833', '19837', '19839', '19849', '19854', '19865', '19887', '19911', '19962', '19963', '19964', '19966', '19969', '19973', '19987', '20003', '20020', '20050', '20076', '20106', '20151', '20153', '20296', '20307', '20367', '20368', '20973']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40334'
    versions = ['Python 3.9']

    @gvanrossum
    Copy link
    Member Author

    Note: PEP-617 is currently under review by the Steering Council, but if they approve we'd like to get it into alpha 6, and reviews are welcome (even though we're still finagling some corner cases of the implementation).

    @gvanrossum gvanrossum added 3.9 only security fixes labels Apr 19, 2020
    @vstinner
    Copy link
    Member

    Once PR 19503 will be merged, I would be interested to see if setjmp()/longjmp() can be avoided. I'm scared by these functions: they require that all functions in between setjmp() and longjmp() call stack have no state and don't need any cleanup at exit.

    @gvanrossum
    Copy link
    Member Author

    VIctor, you were very right about longjmp. See we-like-parsers#119

    @pablogsal
    Copy link
    Member

    New changeset c5fc156 by Pablo Galindo in branch 'master':
    bpo-40334: PEP-617 implementation: New PEG parser for CPython (GH-19503)
    c5fc156

    @pablogsal
    Copy link
    Member

    New changeset 458004b by Pablo Galindo in branch 'master':
    bpo-40334: Fix errors in parse_string.c with old compilers (GH-19666)
    458004b

    @vstinner
    Copy link
    Member

    bpo-40334: Fix errors in parse_string.c with old compilers (GH-19666)

    Pablo told me in private that he plans to add a buildbot worker to test the old parser. So this change is fine.

    @ned-deily
    Copy link
    Member

    Looks like the build changes do not work with a build directory outside of the source directory:

    mkdir build && cd build && ../configure && make

    gcc -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I../Include/internal -IObjects -IInclude -IPython -I. -I../Include -DPy_BUILD_CORE -o Parser/pegen/pegen.o ../Parser/pegen/pegen.c
    error: unable to open output file 'Parser/pegen/pegen.o': 'No such file or directory'
    1 error generated.
    make: *** [Parser/pegen/pegen.o] Error 1

    Lots of build processes depend on this, like for putting out a release.

    @pablogsal
    Copy link
    Member

    Looks like the build changes do not work with a build directory outside of the source directory:

    Opened #19667. Ned, could you review?

    @pablogsal
    Copy link
    Member

    New changeset a25f3c4 by Pablo Galindo in branch 'master':
    bpo-40334: Fix builds outside the source directory and regenerate autoconf files (GH-19667)
    a25f3c4

    @vstinner
    Copy link
    Member

    New changeset 1def775 by Victor Stinner in branch 'master':
    bpo-40334: Rename PyConfig.use_peg to _use_peg_parser (GH-19670)
    1def775

    @vstinner
    Copy link
    Member

    bpo-40334: Rename PyConfig.use_peg to _use_peg_parser (GH-19670)

    This change removes sys.flags.use_peg which was only used in tests. If someone sees a good reason, we may add it back. In the meanwhile, test.support.use_old_parser() can be used.

    I didn't like the idea of having a new sys.flags.use_peg flag appears in 3.9 but disappears in 3.10.

    FYI sys.flags still supports the tuple API! Example:

    >>> sys.flags[9]
    0

    I have no idea what is this 10th member of sys.flags :-)

    @vstinner
    Copy link
    Member

    test_peg_generator emits compiler warnings:

    https://buildbot.python.org/all/#/builders/612/builds/283

    0:08:01 load avg: 1.40 [423/423/1] test_peg_generator passed (7 min 4 sec)
    /tmp/tmp0vbqe8gp/parse.c: In function ‘start_rule’:
    /tmp/tmp0vbqe8gp/parse.c:32:35: warning: passing argument 2 of ‘_PyPegen_lookahead’ from incompatible pointer type [-Wincompatible-pointer-types]
    32 | _PyPegen_lookahead(1, _PyPegen_name_token, p)
    | ^~~~~~~~~~~~~~~~~~~
    | |
    | struct _expr * ()(Parser *)
    In file included from /tmp/tmp0vbqe8gp/parse.c:2:
    /home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-aarch64.lto-pgo/build/Parser/pegen/pegen.h:90:36: note: expected ‘void * (
    )(Parser *)’ but argument is of type ‘struct _expr * ()(Parser *)’
    90 | int _PyPegen_lookahead(int, void *(func)(Parser *), Parser *);
    | ~~~~~~~^~~~~~~~~~~~~~~
    /tmp/tmp54o68k3n/parse.c: In function ‘start_rule’:
    /tmp/tmp54o68k3n/parse.c:32:35: warning: passing argument 2 of ‘_PyPegen_lookahead’ from incompatible pointer type [-Wincompatible-pointer-types]
    32 | _PyPegen_lookahead(0, _PyPegen_name_token, p)
    | ^~~~~~~~~~~~~~~~~~~
    | |
    | struct _expr * (
    )(Parser *)
    In file included from /tmp/tmp54o68k3n/parse.c:2:
    /home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-aarch64.lto-pgo/build/Parser/pegen/pegen.h:90:36: note: expected ‘void * ()(Parser *)’ but argument is of type ‘struct _expr * ()(Parser *)’
    90 | int _PyPegen_lookahead(int, void *(func)(Parser *), Parser *);
    | ~~~~~~~^~~~~~~~~~~~~~~

    @lysnikolaou
    Copy link
    Contributor

    test_peg_generator emits compiler warnings

    These are expected. Is this a problem?

    @WadeSanchez WadeSanchez mannequin added performance Performance or resource usage labels Apr 23, 2020
    @ned-deily
    Copy link
    Member

    These are expected. Is this a problem?

    People and bots running tests normally expect to not see any output from each test in the default case unless there are errors. If these are supposed to be emitted, we should find a way to hide them (and check the results, if necessary) unless a verbose option is selected. Personally, I wouldn't consider that a showstopper for a6 but as something to be fixed by b1.

    @pablogsal
    Copy link
    Member

    New changeset ee40e4b by Pablo Galindo in branch 'master':
    bpo-40334: Don't downcast from Py_ssize_t to int (GH-19671)
    ee40e4b

    @pablogsal
    Copy link
    Member

    test_peg_generator emits compiler warnings:

    This is fixed by #19672

    @pablogsal
    Copy link
    Member

    Those warnings are legitimate: we cannot cast function pointers that have different return types from one to the other. This only happens in the test, but could happen in the grammar if we use lookahead with NAME (&NAME).

    @ned-deily
    Copy link
    Member

    bpo-40370 documents test_peg_generator breakage on an AIX buildbot.

    @pablogsal
    Copy link
    Member

    New changeset 1df5a9e by Pablo Galindo in branch 'master':
    bpo-40334: Fix build errors and warnings in test_peg_generator (GH-19672)
    1df5a9e

    @pablogsal
    Copy link
    Member

    New changeset 8d1cbff by Lysandros Nikolaou in branch 'master':
    bpo-40334: Suppress all output in test_peg_generator (GH-19675)
    8d1cbff

    @vstinner
    Copy link
    Member

    GCC warning on aarch64 Fedora Rawhide LTO + PGO 3.x buildbot:
    https://buildbot.python.org/all/#/builders/612/builds/286

    In function ‘assemble_lnotab’,
    inlined from ‘assemble_emit’ at Python/compile.c:5709:25,
    inlined from ‘assemble’ at Python/compile.c:6048:18:
    Python/compile.c:5663:19: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
    5663 | *lnotab++ = k;
    | ~~~~~~~~~~^~~

    @pablogsal
    Copy link
    Member

    GCC warning on aarch64 Fedora Rawhide LTO + PGO 3.x buildbot:

    Is this related to this issue? We didn't change that line

    @stratakis
    Copy link
    Mannequin

    stratakis mannequin commented Apr 23, 2020

    Is this related to this issue? We didn't change that line

    I can provide you access to the buildbot if you'd like to debug the issue.

    @vstinner
    Copy link
    Member

    Is this related to this issue? We didn't change that line

    I don't know. It's just that I spotted this compiler warnining while reviewing recent buildbot failures.

    @pablogsal
    Copy link
    Member

    I don't know. It's just that I spotted this compiler warnining while reviewing recent buildbot failures.

    Given that is a compiler warning and we didn't change those lines, I would say is not related to this PR :)

    @pablogsal
    Copy link
    Member

    New changeset ebebb64 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Improve various PEG-Parser related stuff (GH-19669)
    ebebb64

    @pablogsal
    Copy link
    Member

    New changeset e10e7c7 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Spacialized error message for invalid args after bare '*' (GH-19865)
    e10e7c7

    @pablogsal
    Copy link
    Member

    New changeset 999ec9a by Lysandros Nikolaou in branch 'master':
    bpo-40334: Add type to the assignment rule in the grammar file (GH-19963)
    999ec9a

    @pablogsal
    Copy link
    Member

    New changeset 99db2a1 by Pablo Galindo in branch 'master':
    bpo-40334: Allow trailing comma in parenthesised context managers (GH-19964)
    99db2a1

    @pablogsal
    Copy link
    Member

    New changeset 470aac4 by Pablo Galindo in branch 'master':
    bpo-40334: Generate comments in the parser code to improve debugging (GH-19966)
    470aac4

    @pablogsal
    Copy link
    Member

    New changeset 2f37c35 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Fix error location upon parsing an invalid string literal (GH-19962)
    2f37c35

    @pablogsal
    Copy link
    Member

    New changeset 4638c64 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Error message for invalid default args in function call (GH-19973)
    4638c64

    @gvanrossum
    Copy link
    Member Author

    New changeset ac7a92c by Pablo Galindo in branch 'master':
    bpo-40334: Avoid collisions between parser variables and grammar variables (GH-19987)
    ac7a92c

    @gvanrossum
    Copy link
    Member Author

    FWIW I propose that we add another Misc/NEWS for the same issue summarizing what happened between alpha6 and beta1. (A lot!)

    @pablogsal
    Copy link
    Member

    There is some discrepancy with the codeop module when running with the new parser:

    ./python -c "import codeop; codeop.CommandCompiler()('raise = 2\n\n', symbol='exec')"

    (No error)

    ./python -Xoldparser -c "import codeop; codeop.CommandCompiler()('raise = 2\n\n', symbol='exec')"
    ...
    File "<input>", line 1
    raise = 2
    ^
    SyntaxError: invalid syntax

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 10, 2020

    Thanks Pablo for replying to my tweet, a couple of other non-failing case in
    https://bugs.python.org/issue40585 that I filled concurrently to your message.

    (also using this occasion to thanks all of you for your work on the new parser)

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 10, 2020

    Seem to not be in the new parser but simply in codeop in particular def _maybe_compile

    The logic seem weird (but weird logic usually have a reason), it try to compile thrice by appending many \n.

    1. Why do that and not return the first successful compile directly ? I'm not sure.
    2. It does compare the repr of error when compile(source + "\n") and compile(source+'\n\n') and only raise if both are identical (which they are not in the new parser, they differ by \n...)
    SyntaxError('invalid syntax', ('<input>', 1, 6, 'def a-b')) 
    vs
    SyntaxError('invalid syntax', ('<input>', 1, 6, 'def a-b\n'))

    This logic seem to go back to the 2000s.

    @gvanrossum
    Copy link
    Member Author

    It definitely has its reasons -- there are pernicious edge cases. Probably revealed by the tests.

    @gvanrossum
    Copy link
    Member Author

    New changeset 27c0d9b by Shantanu in branch 'master':
    bpo-40334: produce specialized errors for invalid del targets (GH-19911)
    27c0d9b

    @pablogsal
    Copy link
    Member

    New changeset a15c9b3 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Always show the caret on SyntaxErrors (GH-20050)
    a15c9b3

    @pablogsal
    Copy link
    Member

    New changeset 16ab070 by Pablo Galindo in branch 'master':
    bpo-40334: Correctly identify invalid target in assignment errors (GH-20076)
    16ab070

    @pablogsal
    Copy link
    Member

    New changeset 2c8cd06 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Improvements to error-handling code in the PEG parser (GH-20003)
    2c8cd06

    @rhettinger
    Copy link
    Contributor

    One minor nit: We're getting a nuisance compiler warning:

    test_peg_generator passed (2 min 40 sec)
    /var/folders/1b/3499qp8s7xv5w0fvjcgl45100000gn/T/tmp90rifd5z/parse.c:97:19: warning: code will never be executed [-Wunreachable-code]
    p->mark = _mark;
    ^~~~~
    1 warning generated.

    @lysnikolaou
    Copy link
    Contributor

    I'm not getting any compiler warnings on macOS (clang) and on Ubuntu (gcc) and I can't find any related warnings on the Windows Github Actions logs either. Could you maybe post a verbose log of test_peg_generator, Raymond?

    @pablogsal
    Copy link
    Member

    New changeset 75b863a by Lysandros Nikolaou in branch 'master':
    bpo-40334: Reproduce error message for type comments on bare '*' in the new parser (GH-20151)
    75b863a

    @pablogsal
    Copy link
    Member

    New changeset f50516e by Batuhan Taskaya in branch 'master':
    bpo-40334: Correctly generate C parser when assigned var is None (GH-20296)
    f50516e

    @pablogsal
    Copy link
    Member

    New changeset ae14583 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153)
    ae14583

    @miss-islington
    Copy link
    Contributor

    New changeset 55c8923 by Miss Islington (bot) in branch '3.9':
    bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153)
    55c8923

    @pablogsal
    Copy link
    Member

    New changeset cba5031 by Batuhan Taskaya in branch 'master':
    bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367)
    cba5031

    @miss-islington
    Copy link
    Contributor

    New changeset 82c274e by Miss Islington (bot) in branch '3.9':
    bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367)
    82c274e

    @pablogsal
    Copy link
    Member

    New changeset 01ece63 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Produce better error messages on invalid targets (GH-20106)
    01ece63

    @pablogsal
    Copy link
    Member

    New changeset a5442b2 by Lysandros Nikolaou in branch '3.9':
    [3.9] bpo-40334: Produce better error messages on invalid targets (GH-20106) (GH-20973)
    a5442b2

    @lysnikolaou
    Copy link
    Contributor

    All parts of the implementation are now in. New issues should be opened for any potential bugs.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants