Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEG discrepancy on 'if {x} {a}: pass' #85825

Closed
gvanrossum opened this issue Aug 28, 2020 · 5 comments
Closed

PEG discrepancy on 'if {x} {a}: pass' #85825

gvanrossum opened this issue Aug 28, 2020 · 5 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@gvanrossum
Copy link
Member

BPO 41659
Nosy @gvanrossum, @terryjreedy, @lysnikolaou
PRs
  • bpo-41659: Disallow curly brace directly after primary #22996
  • [3.9] bpo-41659: Disallow curly brace directly after primary (GH-22996) #23006
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/lysnikolaou'
    closed_at = <Date 2020-10-27.22:39:15.328>
    created_at = <Date 2020-08-28.21:43:15.737>
    labels = ['type-bug', '3.9', '3.10']
    title = "PEG discrepancy on 'if {x} {a}: pass'"
    updated_at = <Date 2020-10-27.22:39:15.328>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2020-10-27.22:39:15.328>
    actor = 'lys.nikolaou'
    assignee = 'lys.nikolaou'
    closed = True
    closed_date = <Date 2020-10-27.22:39:15.328>
    closer = 'lys.nikolaou'
    components = []
    creation = <Date 2020-08-28.21:43:15.737>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41659
    keywords = ['patch']
    message_count = 5.0
    messages = ['376048', '376065', '376412', '379792', '379809']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'lys.nikolaou']
    pr_nums = ['22996', '23006']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue41659'
    versions = ['Python 3.9', 'Python 3.10']

    @gvanrossum
    Copy link
    Member Author

    I just noticed a subtle discrepancy between the old parser and the PEG parser.

    Consider this syntax error:

    if x {a}: pass
    

    The old parser places the caret at the '{':

    $ python3.8 -c 'if x { a } : pass'
      File "<string>", line 1
        if x { a } : pass
             ^
    SyntaxError: invalid syntax
    

    The PEG parser puts it at 'a':

    $ python3.10 -c 'if x { a } : pass'
      File "<string>", line 1
        if x { a } : pass
               ^
    SyntaxError: invalid syntax
    

    I don't think we should put much effort into fixing it -- it's just a curiosity. I suspect it's got to do with some lookahead.

    @gvanrossum gvanrossum added 3.10 only security fixes 3.9 only security fixes labels Aug 28, 2020
    @gvanrossum gvanrossum added type-bug An unexpected behavior, bug, or error 3.10 only security fixes 3.9 only security fixes labels Aug 28, 2020
    @gvanrossum gvanrossum added the type-bug An unexpected behavior, bug, or error label Aug 28, 2020
    @T8T9 T8T9 mannequin added 3.7 (EOL) end of life 3.8 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Aug 29, 2020
    @lysnikolaou
    Copy link
    Contributor

    I had a look at this. I have been testing with the input a{x}, which faces the same problem.

    It's actually because of an invalid_* rule. The call stack looks like this:

    ...
    invalid_comprehension_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:15065)
    genexp_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:11381)
    primary_raw(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10361)
    primary_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10285)
    await_primary_rule(Parser * p) (/home/lysnikolaou/repos/cpython/Parser/parser.c:10240)
    ...

    The invalid_comprehension rule acecpts an LBRACE as the starting token and only fails after it's parsed it, which means that the parser fails with three tokens in the tokens array, the NAME which is valid, the LBRACE which is parsed for the invalid_comprehension rule and the NAME thereafter, upon which the parser fails and backs out of everything. Then, we look at the last token we've parsed and that's where we're placing the caret.

    Because of invalid_comprehension, we can even go as far as making the parser show a completely different error, for example:

    ➜ cpython git:(master) ✗ cat a.py
    a{*x for x in a}
    ➜ cpython git:(master) ✗ ./python a.py
    File "/home/lysnikolaou/repos/cpython/a.py", line 1
    a{*x for x in a}
    ^
    SyntaxError: iterable unpacking cannot be used in comprehension

    Or place the caret even further:

    ➜ cpython git:(master) ✗ cat a.py
    a{*x + a + b + c}
    ➜ cpython git:(master) ✗ ./python a.py
    File "/home/lysnikolaou/repos/cpython/a.py", line 1
    a{*x + a + b + c}
    ^
    SyntaxError: invalid syntax

    There's a simple fix, which is adding an alternative to the primary rule, that parses something along the lines of primary set and then call RAISE_SYNTAX_ERROR_KNOWN_LOCATION there, but that would have to come before the genexp alternative, which worries me because of the performance implications. primary is a left recursive rule that gets called VERY often, probably more than a few times even when parsing a single NAME.

    @terryjreedy
    Copy link
    Member

    Minimal example
    >>> a{ # or
    >>> a {
    In 3.8, this is immediately flagged as a SyntaxError.  In 3.9 and master, a continuation prompt is issued.  This strikes me as a parsing buglet that should preferably be fixed, as it implies that something valid *could* follow '{', thus misleading beginners.  On the other hand, after scanning my keyboard, '{' seems unique in being a legal symbol, unlike `, $, and ?, or combinations like +*, that can AFAIK never follow a name.  So it would need special handling.
    
    
    Side note: for the same reason I dislike the { change, I like the generic 3.9 change for legal operators without a second operand. 
    >>> a *
    Both flag as SyntaxError, but in 3.8, the caret is under '*', falsely implying that '*' cannot follow a name, while in 3.9, it is under the whitespace following, correct implying that the * is legal and that the problem is lack of a second expression (on the same line without continuation).

    @lysnikolaou
    Copy link
    Contributor

    New changeset 15acc4e by Lysandros Nikolaou in branch 'master':
    bpo-41659: Disallow curly brace directly after primary (GH-22996)
    15acc4e

    @lysnikolaou
    Copy link
    Contributor

    New changeset c4b58ce by Lysandros Nikolaou in branch '3.9':
    [3.9] bpo-41659: Disallow curly brace directly after primary (GH-22996) (bpo-23006)
    c4b58ce

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants