Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the old parser #85111

Closed
pablogsal opened this issue Jun 10, 2020 · 69 comments
Closed

Remove the old parser #85111

pablogsal opened this issue Jun 10, 2020 · 69 comments
Assignees
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@pablogsal
Copy link
Member

BPO 40939
Nosy @gvanrossum, @terryjreedy, @vstinner, @ned-deily, @encukou, @nnemkin, @hroncok, @lysnikolaou, @pablogsal, @miss-islington, @brandtbucher
PRs
  • bpo-40939: Remove the old parser #20768
  • bpo-40939: Generate keyword.py using pegen #20800
  • [3.9] bpo-40939: Generate keyword.py using the new parser (GH-20800) #20801
  • bpo-40939: Remove PEG parser easter egg #20802
  • [3.9] bpo-40939: Fix test_keyword for the old parser #20814
  • bpo-40939: Remove some extra references to PYTHONOLDPARSER #20815
  • bpo-40939: Clean and adapt the peg_generator directory after deleting the old parser #20822
  • bpo-40939: Remove the old parser (Part 2) #21005
  • [3.9] bpo-40939: Deprecate the PyParser_SimpleParse* functions #21012
  • bpo-40939: Rename PyPegen* functions to PyParser* #21016
  • [3.9] bpo-40939: Do not emit deprecation warnings inside CPython for old parser APIs #21025
  • [3.9] bpo-40939: Deprecate PyNode_Compile #21036
  • bpo-40939: run autoreconf to fix configure{.ac,} disparity #21152
  • bpo-40939: Delete remaining references to Grammar/Grammar #21624
  • bpo-40939: Add the new grammar to the grammar specification documentation #19969
  • [3.9] bpo-40939: Use the new grammar for the grammar specification documentation (GH-19969) #21641
  • bpo-40939: Remove even more references to the old parser #21642
  • bpo-41808: Add What's New 3.9 entry missing from master #22294
  • bpo-40939: Document removal of the old parser in 3.10 whatsnew #23321
  • bpo-40939: Restore some stable API functions incorrectly deleted #23606
  • bpo-40939: Remove documentation for PyParser_* & add porting notes #26855
  • [3.10] bpo-40939: Remove documentation for PyParser_* & add porting notes (GH-26855) #26898
  • Files
  • py_limited_api_example.zip: Example of using Py_CompileString with Py_LIMITED_API
  • py3.txt: Imports from python3.dll
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2020-12-02.05:17:21.232>
    created_at = <Date 2020-06-10.11:25:17.640>
    labels = ['interpreter-core', '3.10']
    title = 'Remove the old parser'
    updated_at = <Date 2021-06-28.09:36:20.843>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2021-06-28.09:36:20.843>
    actor = 'petr.viktorin'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2020-12-02.05:17:21.232>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2020-06-10.11:25:17.640>
    creator = 'pablogsal'
    dependencies = []
    files = ['49642', '49647']
    hgrepos = []
    issue_num = 40939
    keywords = ['patch']
    message_count = 69.0
    messages = ['371182', '371264', '371265', '371266', '371267', '371268', '371270', '371271', '371288', '371289', '371290', '371291', '371292', '371301', '371307', '371308', '371310', '371311', '371316', '371327', '371929', '371940', '371942', '371953', '371977', '372023', '372025', '372026', '372027', '372370', '374282', '374346', '374410', '374413', '374416', '374417', '377083', '380826', '380828', '380855', '380863', '381095', '381103', '381112', '381136', '381137', '381140', '381172', '381176', '381182', '381227', '381232', '382129', '382130', '382134', '382285', '382287', '382289', '382293', '382299', '382323', '382324', '382325', '382329', '382337', '387017', '396348', '396488', '396622']
    nosy_count = 12.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'vstinner', 'ned.deily', 'petr.viktorin', 'nnemkin', 'Igor.Skochinsky', 'hroncok', 'lys.nikolaou', 'pablogsal', 'miss-islington', 'brandtbucher']
    pr_nums = ['20768', '20800', '20801', '20802', '20814', '20815', '20822', '21005', '21012', '21016', '21025', '21036', '21152', '21624', '19969', '21641', '21642', '22294', '23321', '23606', '26855', '26898']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40939'
    versions = ['Python 3.10']

    @pablogsal
    Copy link
    Member Author

    As stated in PEP-617, the old parser will be removed in Python 3.10

    @pablogsal pablogsal added the 3.10 only security fixes label Jun 10, 2020
    @pablogsal pablogsal self-assigned this Jun 10, 2020
    @pablogsal pablogsal added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.10 only security fixes labels Jun 10, 2020
    @pablogsal pablogsal self-assigned this Jun 10, 2020
    @pablogsal pablogsal added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 10, 2020
    @pablogsal
    Copy link
    Member Author

    New changeset 9727694 by Lysandros Nikolaou in branch 'master':
    bpo-40939: Generate keyword.py using the new parser (GH-20800)
    9727694

    @vstinner
    Copy link
    Member

    >>> __new_parser__
      File "<stdin>", line 1
        __new_parser__
        ^
    SyntaxError: You found it!

    "new", "ex" or "ng" are not really future proof names. Can we rename the keyword to "__peg_parser__"?

    @lysnikolaou
    Copy link
    Contributor

    Can we rename the keyword to "__peg_parser__"?

    I guess we could just remove this, as soon as the old parser is out. We were only using this to differentiate between the two parsers, when we were testing enabling/disabling the old one. I could get a PR ready to be merged after #64967 is there.

    @pablogsal
    Copy link
    Member Author

    I guess we could just remove this, as soon as the old parser is out. We were only using this to differentiate between the two parsers, when we were testing enabling/disabling the old one. I could get a PR ready to be merged after #64967 is there.

    I would personally would like to keep the easter egg, but I assume is better to rename it to "__peg_parser__".

    @miss-islington
    Copy link
    Contributor

    New changeset 961edf7 by Miss Islington (bot) in branch '3.9':
    bpo-40939: Generate keyword.py using the new parser (GH-20800)
    961edf7

    @lysnikolaou
    Copy link
    Contributor

    I would personally would like to keep the easter egg, but I assume is better to rename it to "__peg_parser__".

    Ok then! On it.

    @vstinner
    Copy link
    Member

    I did'nt ask to remove the easter egg. I'm just asking to avoid the "new" name. In my experience, each time that a "new" thing happens, later we have to use "new extended", "new_v2" or worse name :-)

    Oh, if the name changes, please change it in 3.9 as well.

    Look at this amazing names of the 5 flavors of functions parsing a string:

    PyParser_ParseString()
    PyParser_ParseStringFlags()
    PyParser_ParseStringFlagsFilename()
    PyParser_ParseStringFlagsFilenameEx() <= public!
    PyParser_ParseStringObject()

    Same for parsing a file:

    PyParser_ParseFile()
    PyParser_ParseFileFlags()
    PyParser_ParseFileFlagsEx() <= public!
    PyParser_ParseFileObject()

    Or PyRun functions:

    PyRun_String()
    PyRun_AnyFile()
    PyRun_AnyFileEx() <= public!
    PyRun_AnyFileFlags()
    PyRun_SimpleString()
    PyRun_SimpleFile()
    PyRun_SimpleFileEx() <= public!
    PyRun_InteractiveOne()
    PyRun_InteractiveLoop()
    PyRun_File()
    PyRun_FileEx() <= public!
    PyRun_FileFlags()

    ceval.c:

    PyEval_EvalCode()
    PyEval_EvalCodeEx() <= public!
    _PyEval_EvalCodeWithName()
    _PyEval_EvalCode()

    I cannot count the number of "Ex" functions that we have :-)

    Py_Finalize() -> :Py_FinalizeEx() <= public!
    PyErr_Print() -> PyErr_PrintEx() <= public!
    PySys_SetArgv() -> PySys_SetArgvEx() <= public!
    PyErr_WarnEx() <= public!

    _PyBytes_FormatEx()
    _PyDict_MergeEx()
    _Py_DecodeLocaleEx(), _Py_EncodeLocaleEx()
    struct PyMemAllocatorEx
    _Py_DecodeUTF8Ex(), _Py_EncodeUTF8Ex()
    etc.

    @gvanrossum
    Copy link
    Member

    Honestly I see no reason to keep that easter egg. Can we remove it please?

    @vstinner
    Copy link
    Member

    New changeset 961edf7 by Miss Islington (bot) in branch '3.9':
    bpo-40939: Generate keyword.py using the new parser (GH-20800)

    This change broke this buildbot:

    AMD64 Arch Linux VintageParser 3.9:
    https://buildbot.python.org/all/#/builders/765/builds/67
    1 test failed: test_keyword

    @pablogsal
    Copy link
    Member Author

    Ok, let's remove it. Lysandros can you modify PR 20802 to remove it?

    @lysnikolaou
    Copy link
    Contributor

    Lysandros can you modify PR 20802 to remove it?

    Yup!

    @lysnikolaou
    Copy link
    Contributor

    This change broke this buildbot:

    #65001 fixes this breakage.

    @gvanrossum
    Copy link
    Member

    New changeset bcd7dee by Lysandros Nikolaou in branch 'master':
    bpo-40939: Remove PEG parser easter egg (new_parser) (bpo-20802)
    bcd7dee

    @pablogsal
    Copy link
    Member Author

    New changeset 1ed83ad by Pablo Galindo in branch 'master':
    bpo-40939: Remove the old parser (GH-20768)
    1ed83ad

    @vstinner
    Copy link
    Member

    A few remaining references to the good old times of the old parser:

    Programs/_testembed.c:488: putenv("PYTHONOLDPARSER=1");
    Programs/_testembed.c:676: putenv("PYTHONOLDPARSER=1");
    Tools/scripts/run_tests.py:29: if 'PYTHONOLDPARSER' not in os.environ:

    @pablogsal
    Copy link
    Member Author

    A few remaining references to the good old times of the old parser:

    Thanks Victor, opened #20815 to address those.

    @miss-islington
    Copy link
    Contributor

    New changeset 436b648 by Pablo Galindo in branch 'master':
    bpo-40939: Remove some extra references to PYTHONOLDPARSER (GH-20815)
    436b648

    @pablogsal
    Copy link
    Member Author

    New changeset 3782497 by Pablo Galindo in branch '3.9':
    [3.9] bpo-40939: Fix test_keyword for the old parser (GH-20814)
    3782497

    @pablogsal
    Copy link
    Member Author

    New changeset 756180b by Pablo Galindo in branch 'master':
    bpo-40939: Clean and adapt the peg_generator directory after deleting the old parser (GH-20822)
    756180b

    @nnemkin
    Copy link
    Mannequin

    nnemkin mannequin commented Jun 20, 2020

    Shouldn't the following files be deleted too?

    Include/bitset.h
    Include/grammar.h
    Include/graminit.h
    Include/parsetok.h
    Include/node.h
    Python/graminit.c
    Parser/node.c

    Also declarations:
    PyNode_Compile in Include/compile.h
    PyParser_SimpleParse* in Include/pythonrun.h

    And PyParser_ASTFrom* API need new implementations.

    @lysnikolaou
    Copy link
    Contributor

    I'm currently testing a commit that removes all these files on my fork, before I push it upstream. A question that I'm not 100% sure about is if we can already remove the symbol module. I guess it's okay since it got deprecated in 3.9 (bpo-40759) and the old parser is also out, but just to make sure.

    @gvanrossum
    Copy link
    Member

    You can delete symbol.py -- it has no use now that the old parser is gone.

    We should probably also update the regeneration targets in the Makefile. (At least review them.)

    @pablogsal
    Copy link
    Member Author

    New changeset 314858e by Lysandros Nikolaou in branch 'master':
    bpo-40939: Remove the old parser (Part 2) (GH-21005)
    314858e

    @pablogsal
    Copy link
    Member Author

    I will submit a PR today

    @lysnikolaou
    Copy link
    Contributor

    Pablo, have you already started on this? I didn't see your comment earlier and I've got a PR ready.

    @pablogsal
    Copy link
    Member Author

    Pablo, have you already started on this? I didn't see your comment earlier and I've got a PR ready.

    Yeah, but don't worry: submit your PR and I will review it :)

    @miss-islington
    Copy link
    Contributor

    New changeset c26d591 by Lysandros Nikolaou in branch 'master':
    bpo-40939: Document removal of the old parser in 3.10 whatsnew (GH-23321)
    c26d591

    @lysnikolaou
    Copy link
    Contributor

    Victor, Miro, both removal of the parser module and of all the C API functions are now documented in the 3.10 whatsnew document. Do you feel that this is enough for us to close this issue again?

    @hroncok
    Copy link
    Mannequin

    hroncok mannequin commented Nov 16, 2020

    Thanks. I feel like the What's new document should teach people what to do when they are hit by the removals. The removals are documented, but the developers who are affected have no clue what to do. What do you think?

    (Sorry I wasn't able to provide this feedback before the PR was merged.)

    @pablogsal
    Copy link
    Member Author

    The removals are documented, but the developers who are affected have no clue what to do. What do you think?

    Here is difficult to recommend a canonical Path because as I mentioned, there is no replacement for these functions because that functions returned CST nodes and those not exist anymore.

    @vstinner
    Copy link
    Member

    We should check of the 3 mentioned projects (mod_wsgi,
    kdevelop-python, unbound) use the removed functions to suggest a similar replacement. I understood that there is no drop-in replacement.

    unbound does not use to get the parsed Python code as CST, but uses PyParser_SimpleParseFile() just to display an error message to stderr. I understand that PyParser_ASTFromFileObject() + PyErr_Print() could be used.

    But PyParser_ASTFromFileObject() is low-level, it requires to pass an arena object. Maybe the *intent* here is to call compile() and display the error message? Pseudo-code:

    ---

    fp = fopen(filename, "r");
    bytes = readall(fp);
    PyObject *builtins = PyEval_GetBuiltins();
    obj = PyObject_CallMethod(builtins, "compile", "O", bytes);
    Py_DECREF(bytes);
    if (!obj) {
        PyErr_Print();
    }
    else {
        Py_DECREF(obj);
    }
    fclose(fp);

    This code is non-trivial :-( Should we provide a *new* C function doing that?

    Input: filename
    Output: code object

    Or maybe I just missed an existing function :-)

    @pablogsal
    Copy link
    Member Author

    This code is non-trivial :-( Should we provide a *new* C function doing that?

    We could discuss adding a new C function, but IMHO that code is not especially horrible or unreadable. I agree it could be simpler, though.

    @IgorSkochinsky
    Copy link
    Mannequin

    IgorSkochinsky mannequin commented Nov 30, 2020

    Hi All,

    It seems this patch removes some functions provided by the Stable ABI (PEP-384), most notably Py_CompileString. Was this the intention? If not, is there still a chance to reintroduce it before the release?

    @vstinner
    Copy link
    Member

    It seems this patch removes some functions provided by the Stable ABI (PEP-384), most notably Py_CompileString. Was this the intention? If not, is there still a chance to reintroduce it before the release?

    The functions removal is intentional and was approved by the PEP-617.

    But Py_CompileString() function was not removed, it's still in the master branch (future Python 3.10). Why do you think that it has been removed?

    @IgorSkochinsky
    Copy link
    Mannequin

    IgorSkochinsky mannequin commented Nov 30, 2020

    But Py_CompileString() function was not removed, it's still in the master branch (future Python 3.10). Why do you think that it has been removed?

    Thank you. It looked that way because of the removed block of lines in the commit 1ed83ad (pythonrun.c). We were also getting a missing symbol error. We'll check again to be sure.

    @IgorSkochinsky
    Copy link
    Mannequin

    IgorSkochinsky mannequin commented Dec 2, 2020

    Attached is a sample program which works on 3.9 but fails linking with 3.10.0a2

    The .so is missing the symbol:

    igor@LAPTOP:~/py_limited_api_example$ nm /home/igor/lib/libpython3.9.so | grep Py_CompileString
    0000000000212720 T Py_CompileString
    000000000020fe30 T Py_CompileStringExFlags
    0000000000212730 T Py_CompileStringFlags
    000000000020fd40 T Py_CompileStringObject

    igor@LAPTOP:~/py_limited_api_example$ nm /home/igor/lib/libpython3.10.so | grep Py_CompileString
    0000000000201a40 T Py_CompileStringExFlags
    0000000000201980 T Py_CompileStringObject

    Please stop breaking the Stable ABI :/

    @gvanrossum
    Copy link
    Member

    Hm, I wonder if there's a typo here in pythonrun.c:

    /* For use in Py_LIMITED_API */
    #undef Py_CompileString
    PyObject *
    PyCompileString(const char *str, const char *filename, int start)
    {
        return Py_CompileStringFlags(str, filename, start, NULL);
    }

    Shouldn't that function be named Py_CompileString (i.e. Py_ instead of Py)?

    This seems to be old code, but there's normally a macro Py_CompileString() that translates to Py_CompileStringFlags() in pythonrun.h:

    #ifdef Py_LIMITED_API
    PyAPI_FUNC(PyObject *) Py_CompileString(const char *, const char *, int);
    #else
    #define Py_CompileString(str, p, s) Py_CompileStringExFlags(str, p, s, NULL, -1)
    #define Py_CompileStringFlags(str, p, s, f) Py_CompileStringExFlags(str, p, s, f, -1)
    .
    .
    .

    @pablogsal
    Copy link
    Member Author

    Seems that commit 1ed83ad was deleting some functions that were not correctly covered by redirection macros. I have opened PR 23606 to restore those as they were.

    @pablogsal
    Copy link
    Member Author

    New changeset 46bd5ed by Pablo Galindo in branch 'master':
    bpo-40939: Restore some stable API functions incorrectly deleted (GH-23606)
    46bd5ed

    @IgorSkochinsky
    Copy link
    Mannequin

    IgorSkochinsky mannequin commented Dec 2, 2020

    Thanks for the fix! I'd like to submit a test to avoid this and similar issues in future. Are there any guidelines for this? Sorry if this is a wrong place to ask.

    @pablogsal
    Copy link
    Member Author

    I am currently working on test that checks that the stable API symbols are correctly exported. Unfortunately there is no official maintained list if those symbols, so is taking a while

    @IgorSkochinsky
    Copy link
    Mannequin

    IgorSkochinsky mannequin commented Dec 2, 2020

    I am currently working on test that checks that the stable API symbols are correctly exported.

    Thank you very much! For added motivation, the 3.8.0 release was unusable thanks to bpo-37633 which was somewhat similar (also in pythonrun). For reference, I've attached a list of symbols we currently use (a few more we have to import dynamically since they're not in the stable ABI but we'd like to keep that list as short as possible).

    @nnemkin
    Copy link
    Mannequin

    nnemkin mannequin commented Dec 2, 2020

    Unfortunately there is no official maintained list if those symbols, so is taking a while

    On Windows there is PC/python3dll.c.

    @pablogsal
    Copy link
    Member Author

    On Windows there is PC/python3dll.c.

    Even that is severely out of date, unfortunately. There are many functions that were removed in 3.8 and 3.7 that are still listed there.

    @pablogsal
    Copy link
    Member Author

    Opened https://bugs.python.org/issue42545 for that

    @vstinner
    Copy link
    Member

    FYI the unbound project was fixed by calling Py_CompileString() on Python 3.9 and newer:
    NLnetLabs/unbound@e0d426e

    @encukou
    Copy link
    Member

    encukou commented Jun 22, 2021

    • The removed functions are still listed in the documentation; this is confusing.

    • struct _node is not very usable in the C API, but what I saw in mod_wsgi was giving gave PyParser_* output to PyNode_Compile. I tried to write porting notes for this usage. It should also help with the "compile to check for syntax errors" case in unbound.

    @encukou
    Copy link
    Member

    encukou commented Jun 24, 2021

    New changeset 29987f7 by Petr Viktorin in branch 'main':
    bpo-40939: Remove documentation for PyParser_* & add porting notes (GH-26855)
    29987f7

    @encukou
    Copy link
    Member

    encukou commented Jun 28, 2021

    New changeset dc10264 by Miss Islington (bot) in branch '3.10':
    bpo-40939: Remove documentation for PyParser_* & add porting notes (GH-26855) (GH-26898)
    dc10264

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants