Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put back the ability to parse files where async/await aren't keywords #80156

Closed
gvanrossum opened this issue Feb 12, 2019 · 26 comments
Closed

Put back the ability to parse files where async/await aren't keywords #80156

gvanrossum opened this issue Feb 12, 2019 · 26 comments
Assignees
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@gvanrossum
Copy link
Member

BPO 35975
Nosy @gvanrossum, @scoder, @encukou, @serhiy-storchaka, @1st1, @ilevkivskyi, @ethanhs, @miss-islington, @mayankasthana, @msullivan
PRs
  • bpo-35975: Support parsing earlier minor versions of Python 3 #12086
  • bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags #21021
  • [3.9] bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags #21022
  • [3.8] bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags #21023
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2020-06-29.16:40:15.196>
    created_at = <Date 2019-02-12.00:23:38.530>
    labels = ['interpreter-core', 'type-bug', '3.8']
    title = "Put back the ability to parse files where async/await aren't keywords"
    updated_at = <Date 2020-06-29.16:40:15.195>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2020-06-29.16:40:15.195>
    actor = 'gvanrossum'
    assignee = 'gvanrossum'
    closed = True
    closed_date = <Date 2020-06-29.16:40:15.196>
    closer = 'gvanrossum'
    components = ['Interpreter Core']
    creation = <Date 2019-02-12.00:23:38.530>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35975
    keywords = ['patch']
    message_count = 26.0
    messages = ['335277', '336785', '336786', '336788', '336789', '336790', '336795', '336800', '336831', '336879', '337243', '337432', '343749', '343786', '371938', '371943', '371949', '371974', '371975', '371983', '372006', '372019', '372484', '372485', '372486', '372602']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'scoder', 'petr.viktorin', 'serhiy.storchaka', 'yselivanov', 'levkivskyi', 'ethan smith', 'miss-islington', 'masthana', 'msullivan']
    pr_nums = ['12086', '21021', '21022', '21023']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35975'
    versions = ['Python 3.8']

    @gvanrossum
    Copy link
    Member Author

    Now that the ast module can be used to parse type comments, there’s one more feature I’d like to lobby for – a flag that modifies the grammar slightly so it resembles an older version of Python (going back to 3.4).

    This is used in mypy to decouple the Python version you’re running from the Python version for which you’re checking compatibility (useful when checking code that will be deployed on a system with a different Python version installed). I imagine this would be useful to other linters as well, and the implementation is mostly manipulating whether async and await are keywords. But if there’s pushback to this part I can live without it – the rest of the work is still useful.

    The implementation in typed_ast takes the form of a feature_version flag which is set to the minor Python version to be parsed. Setting it to 4 or lower would make async/await never keywords, setting it to 5 or 6 would make them conditional per PEP-492, and setting it to 7 or higher would make these unconditional keywords. A few minor uses of the same flag also reject f-strings, annotated assignments (PEP-526), and matrix multiply for versions where those don't exist.

    But I don't need all of those -- I really just need to be able to select the 3.5/3.6 rules for conditional async/await keywords, since all the other features are purely backwards compatible, whereas with async/await there is legal (and plentiful!) code that uses these as variable or function names that should be supported in 3.5/3.6 mode.

    Of course having it be a (minor) version number would still be more future-proof -- if say in 3.10 we add a match expression using some kind of conditional keyword hack, we might have to dust it off.

    Note that much of what I'm asking for would effectively roll back https://bugs.python.org/issue30406 -- sorry Jelle! (Though there's more to it -- Serhiy's introduction of Grammar/Token followed, and I would still need to thread some kind of flag all the way from ast.parse() to tokenizer.c.

    @gvanrossum gvanrossum added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 13, 2019
    @gvanrossum gvanrossum self-assigned this Feb 13, 2019
    @gvanrossum gvanrossum added the type-bug An unexpected behavior, bug, or error label Feb 13, 2019
    @1st1
    Copy link
    Member

    1st1 commented Feb 28, 2019

    Well, it's not just rolling back async/await from being keywords. Since 3.7 it's possible to create async generator expressions in non-async functions. This wasn't possible to do with old hacks on the lexer. So if you want to revert the change you risk losing some functionality we enabled in 3.7

    @gvanrossum
    Copy link
    Member Author

    That should still work. The strategy is as follows:

    • For Python 3.7 or higher, 'async' and 'await' are *always* recognized.

    • For Python 3.5 and 3.6, the hacks from the PEP are used.

    @1st1
    Copy link
    Member

    1st1 commented Feb 28, 2019

    Actually, I think we could revert to old lexer hacks in 3.8 if we modify them to treat 'async for' tocken sequence as one meta-tocken.

    @gvanrossum
    Copy link
    Member Author

    But why would we? I already have a working solution. (Literally reverting Jelle's PR won't work anyway, because Serhiy changed how regen-token works.)

    @1st1
    Copy link
    Member

    1st1 commented Feb 28, 2019

    But why would we? I already have a working solution.

    I've heard some complaints that it's hard to migrate to Python 3.7 because async and await are keywords (although I think by now all popular libraries have migrated already), so in case you ever considered to revert that decision I think we could actually make it work. But for the current PR you're working on we don't need that, you're right.

    @gvanrossum
    Copy link
    Member Author

    And the same tokenizer trick that detects 'async def' could detect 'async for' easily. See https://github.com/python/cpython/pull/12086/files#diff-30b8266a4285de981f8b1b82a8cc6231R1418

    @1st1
    Copy link
    Member

    1st1 commented Feb 28, 2019

    And the same tokenizer trick that detects 'async def' could detect 'async for' easily.

    Exactly. I just never thought that we could support async generator expressions with the kind of hacks in tokenizer we had in 3.5-3.6. FWIW I still believe that it's better for async & await tokens to be proper keywords, but in case we ever realize we want them to become "soft" keywords we apparently have options.

    @serhiy-storchaka
    Copy link
    Member

    Would not be simpler to just drop the support of Python versions <3.7 in new MyPy versions?

    @gvanrossum
    Copy link
    Member Author

    Would not be simpler to just drop the support of Python versions <3.7 in new MyPy versions?

    Not really -- mypy has a lot of users who run it over (very) large code bases that can't easily be upgraded. Basically mypy has to support all Python versions that haven't reached their end of life yet. (And it's also important that mypy not be constrained to the same Python version as the target code.)

    I suppose we could just stick with the existing typed_ast that supports 3.4 through 3.7, but we actually have a use case for the end_lineno and end_col_offset fields that were just added to ast in 3.8, and in general we'd like to support any future grammar elements and ast features.

    @gvanrossum
    Copy link
    Member Author

    Everyone watching, the PR is now ready for review!

    @miss-islington
    Copy link
    Contributor

    New changeset 495da29 by Miss Islington (bot) (Guido van Rossum) in branch 'master':
    bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086)
    495da29

    @encukou
    Copy link
    Member

    encukou commented May 28, 2019

    It looks like this caused bpo-37072: PyAST_FromNodeObject doesn't ignore flags any more, and when PyNode_Compile passes NULL flags, it crashes.

    @gvanrossum
    Copy link
    Member Author

    Sorry for the oversight. Do you need help fixing that?

    @scoder
    Copy link
    Contributor

    scoder commented Jun 20, 2020

    I was made aware [1] that the addition of the "cf_feature_version" field broke the backwards compatibility of the "PyRun_StringFlags()" function [2]. Unlike what the docs of "PyCompilerFlags" [3] suggest, the new field is not ignored there but must now be set in order to enable the syntax features of the running Python version.

    Background: A Cython user was trying "exec()" on code with f-strings in Py3.8 and got a SyntaxError, because "cf_feature_version" had not been initialised.

    I can see two types of existing code being affected:

    1. applications or modules that execute user provided Python code (as in the bug report) for which users expect the syntax of the currently running Python runtime to be available.

    2. code that used to work in Py3.6/7 using the available Python syntax features and now runs into newly introduced syntax restrictions in Py3.8, such as f-strings no longer being available.

    Since Py3.8 has been out for a while with this change, and there were apparently little complaints in addition to bpo-37072, should we just update the documentation of "PyCompilerFlags" and "PyRun_StringFlags()" to mention the change there?

    [1] cython/cython#3695
    [2] https://docs.python.org/3/c-api/veryhigh.html#c.PyRun_StringFlags
    [3] https://docs.python.org/3/c-api/veryhigh.html#c.PyCompilerFlags

    @gvanrossum
    Copy link
    Member Author

    But it's a bug, right? The intention for sure is that the cf_feature_version field is only used when the PyCF_ONLY_AST flags is set in cf_flags, per the docs you cite as [3]. Why wouldn't it be possible to fix that?

    @gvanrossum gvanrossum reopened this Jun 20, 2020
    @scoder
    Copy link
    Contributor

    scoder commented Jun 20, 2020

    I wasn't sure which is better – solve it or leave it. But it seems a) easy to adapt to on user side, and b) solvable in CPython in a way that allows code that wants to adapt to work across 3.8.x versions. Only code that does not get adapted would risk failure in existing 3.8.x versions, and benefit from a fix in future 3.8.x releases.

    So, yeah, I think we should fix it then.

    @scoder scoder closed this as completed Jun 20, 2020
    @gvanrossum gvanrossum reopened this Jun 21, 2020
    @gvanrossum
    Copy link
    Member Author

    I'm fixing it -- can you verify? I have no easy way to test this. DO you think we should also add a note to the docs that this was broken in 3.8.0-3?

    @gvanrossum
    Copy link
    Member Author

    Hm, the backports have to be done manually, since the 3.8 parser and the 3.10 parser don't overlap, and 3.9 has both versions...

    @scoder
    Copy link
    Contributor

    scoder commented Jun 21, 2020

    PRs look good to me. Here is a test that fails for me with the master branch and works with your fixes. It has a slightly odd place in the subinterpreter tests, but I would argue that it's not entirely misplaced there. You would want to know when subinterpreters start rejecting valid code, wouldn't you? :)

    diff --git a/Lib/test/test_capi.py b/Lib/test/test_capi.py
    index 73e167a0b0..9bed8255e1 100644
    --- a/Lib/test/test_capi.py
    +++ b/Lib/test/test_capi.py
    @@ -627,6 +627,24 @@ class SubinterpreterTest(unittest.TestCase):
                 self.assertNotEqual(pickle.load(f), id(sys.modules))
                 self.assertNotEqual(pickle.load(f), id(builtins))
     
    +    def test_subinterps_recent_language_features(self):
    +        r, w = os.pipe()
    +        code = """if 1:
    +            import pickle
    +            with open({:d}, "wb") as f:
    +
    +                @(lambda x:x)  # Py 3.9
    +                def noop(x): return x
    +
    +                a = (b := f'1{{2}}3') + noop('x')  # Py3.8 (:=) / 3.6 (f'')
    +                pickle.dump(dict(a=a, b=b), f)
    +            """.format(w)
    +
    +        with open(r, "rb") as f:
    +            ret = support.run_in_subinterp(code)
    +            self.assertEqual(ret, 0)
    +            self.assertEqual(pickle.load(f), {'a': '123x', 'b': '123'})
    +
         def test_mutate_exception(self):
             """
             Exceptions saved in global module state get shared between
    diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
    index 808483ebd7..468e25ff9e 100644
    --- a/Modules/_testcapimodule.c
    +++ b/Modules/_testcapimodule.c
    @@ -3468,6 +3468,8 @@ run_in_subinterp(PyObject *self, PyObject *args)
         const char *code;
         int r;
         PyThreadState *substate, *mainstate;
    +    /* only initialise 'cflags.cf_flags' to test backwards compatibility */
    +    PyCompilerFlags cflags = {0};
     
         if (!PyArg_ParseTuple(args, "s:run_in_subinterp",
                               &code))
    @@ -3486,7 +3488,7 @@ run_in_subinterp(PyObject *self, PyObject *args)
             PyErr_SetString(PyExc_RuntimeError, "sub-interpreter creation failed");
             return NULL;
         }
    -    r = PyRun_SimpleString(code);
    +    r = PyRun_SimpleStringFlags(code, &cflags);
         Py_EndInterpreter(substate);
     
         PyThreadState_Swap(mainstate);

    @gvanrossum
    Copy link
    Member Author

    Okay, I will add that diff (I'll probably add async def to the set of features tested). Do you mind if I just incorporate it in my diff? I could add an acknowledgment in the commit message. Otherwise you could send a separate PR.

    @scoder
    Copy link
    Contributor

    scoder commented Jun 21, 2020

    Feel free to use the patch in any way you like.

    @gvanrossum
    Copy link
    Member Author

    New changeset 9d197c7 by Guido van Rossum in branch 'master':
    bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags (bpo-21021)
    9d197c7

    @gvanrossum
    Copy link
    Member Author

    New changeset 2a1ee1d by Guido van Rossum in branch '3.9':
    [3.9] bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags (bpo-21022)
    2a1ee1d

    @gvanrossum
    Copy link
    Member Author

    New changeset e653369 by Guido van Rossum in branch '3.8':
    [3.8] bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags (bpo-21023)
    e653369

    @gvanrossum
    Copy link
    Member Author

    Fixed again. Thanks Stefan for finding this! I apologize for the bug.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants