Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify AST for slices #79003

Closed
serhiy-storchaka opened this issue Sep 27, 2018 · 15 comments
Closed

Simplify AST for slices #79003

serhiy-storchaka opened this issue Sep 27, 2018 · 15 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 34822
Nosy @gvanrossum, @brettcannon, @nascheme, @ncoghlan, @benjaminp, @serhiy-storchaka, @vedgar, @pablogsal, @thautwarm, @isidentical
PRs
  • bpo-34822: Simplify AST for subscription. #9605
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-10.16:54:21.854>
    created_at = <Date 2018-09-27.16:17:40.175>
    labels = ['interpreter-core', 'type-feature', 'library', '3.9']
    title = 'Simplify AST for slices'
    updated_at = <Date 2020-03-10.16:54:21.854>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2020-03-10.16:54:21.854>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-10.16:54:21.854>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core', 'Library (Lib)']
    creation = <Date 2018-09-27.16:17:40.175>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34822
    keywords = ['patch']
    message_count = 15.0
    messages = ['326570', '326991', '361997', '362016', '362224', '362234', '363583', '363592', '363593', '363597', '363598', '363648', '363650', '363756', '363829']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'nascheme', 'ncoghlan', 'benjamin.peterson', 'serhiy.storchaka', 'veky', 'pablogsal', 'thautwarm', 'BTaskaya']
    pr_nums = ['9605']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue34822'
    versions = ['Python 3.9']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently an AST for expressions can have non-terminal nodes of two types: expr and slice. Every of them can be a one of several kinds. A slice can be of kind Index (just an expression), Slice (optional expressions separated by ":") or ExtSlice (slices separated by ",").

    For example, the expression "d[a, ..., b:c]" is represented as:

    Subscript(
        Name('d', Load()),
        ExtSlice(
            [
                Index(Name('a', Load())),
                Index(Constant(Ellipsis)),
                Slice(Name('b', Load()), Name('c', Load()), None)
            ]
        ),
        Load()
    )
    

    and the expression "d[a, ..., b]" is represented as:

    Subscript(
        Name('d', Load()),
        Index(
            Tuple(
                [
                    Name('a', Load()),
                    Constant(Ellipsis),
                    Name('b', Load())
                ],
                Load()
            )
        ),
        Load()
    )
    

    (note that ExtSlice is used only if there are slices in subexpressions).

    I suggest to get rid of the separate slice type. The Slice kind will be a kind of the expr type instead of the slice type. The ExtSlice kind will be always replaced with a Tuple, even if subexpressions contain slices. Nodes of the Index kind will be replaced with expr nodes to which they are referred. For example, the expression "d[a, ..., b:c]" will be represented as:

    Subscript(
        Name('d', Load()),
        Tuple(
            [
                Name('a', Load()),
                Constant(Ellipsis),
                Slice(Name('b', Load()), Name('c', Load()), None)
            ],
            Load()
        ),
        Load()
    )
    

    This will simplify the code for handling AST, especially the C code. The following PR removes around 400 lines of code (a half of them are generated, but others are handwritten). I hope that this regularization will help to write a general code for walking AST for expressions and remove more duplicated code in ast_opt.c, ast_unparse.c, and symtable.c. This even can help to solve a problem with crashes in handling too deep AST if implement the recursion without using the C stack (but this is dreams).

    This change is more breaking than a change in bpo-32892. What will continue to work:

    • The code for creating an AST when pass values as arguments: Index(value) will return just value, ExtSlice(slices) will return Tuple(slices, Load()).

    • NodeVisitor based processing. Methods visit_Index() and visit_ExtSlice() will be never invoked. The semantic of visit_Slice() will be not changed. visit_Tuple() will be invoked instead of visit_ExtSlice() for extended slices.

    • isinstance() and issubclass() checks for Slice. Subclassing of Slice.

    What will no longer work (with the current implementation):

    • The code that creates empty AST nodes and sets their attributes. node = Index(); node.value = value will no longer work.

    • The code that reads attributes of just created Index and ExtSlice nodes. Index(value) will return just value instead of a new object whose attribute "value" is a specified value. A list of subexpressions of ExtSlice(slices) will be accessible as the "elts" attribute instead of "dims" (because it is a Tuple).

    • isinstance() and issubclass() checks for Index and ExtSlice will always return False.

    • Subclassing of Index and ExtSlice. Instantiating subclasses of Index and ExtSlice will return the same result as for Index and ExtSlice, i.e. not instance of these classes.

    • The code that produces a Python code from AST will need to handle indexing with tuples specially (see Tools/parser/unparse.py) because d[(a, b)] is valid syntax (although parenthesis are redundant), but d[(a, b:c)] is not.

    Some limitations are caused by the desire for simplicity and can be removed. For example it is possible to add the "dims" alias of "elts" to Tuple, and make subclasses working as before. It is more hard and inefficient to keep isinstance() checks and attribute access for Index. If some breakage for Index is not avoidable, I'm not sure that it is worth to spent efforts for imitating the old behavior for ExtSlice.

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 27, 2018
    @nascheme
    Copy link
    Member

    nascheme commented Oct 3, 2018

    Hello Serhiy,

    I've not reviewed the patch but I trust that if you say it simplifies things, that's so. Perhaps the important question is if it is okay to change the AST in backwards incompatible ways within 3.x releases. As a library author who could be affected, I think it is okay. The AST is mostly an implementation detail and should not required to maintain compatibility. We expose it via the 'ast' module for low level tools that want to manipulate it. It is up to those users to handle backwards incompatible changes.

    That said, we shouldn't just change it for trivial reasons. That just causes work for 3rd party libraries. Removing 400 lines of code seems like sufficient reason.

    @isidentical
    Copy link
    Sponsor Member

    Serhiy, any plans to bump this patch to 3.9 and continue / merge? In general the benefits looks great, but on the other hand this might cause some breakage which I guess as @nascheme is OK in AST.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated the PR.

    I can obviously be biased about my changes, so I need an approval of other core developer to merge them.

    I created several PRs to popular third-party projects which work with AST to support both old and new AST schemes.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes and removed 3.8 only security fixes labels Feb 15, 2020
    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Feb 18, 2020

    I wrote some AST analyzers, and this would have simplified my code. So I welcome it. :-)

    However, it means people might be tempted to write

    a[b:c, d]    as    a[(b:c, d)]
    

    (or at least expect it to work -- same as a[b, c] can now be written as a[(b, c)]). We aren't going to allow this?

    @serhiy-storchaka
    Copy link
    Member Author

    No, this PR does not change the Python syntax. It only changes the AST representation.

    @gvanrossum
    Copy link
    Member

    Haven’t looked at the code but I welcome the simplification.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 7, 2020

    The one thing in the PR that makes me slightly wary is the point Vedran raised: in the old AST _Unparser code, the fact that index tuples containing slices should be printed without parentheses was encapsulated in the ExtSlice node type, but with Index and ExtSlice removed, that behaviour is now instead implemented by duplicating the visit_Tuple logic directly in visit_Subscript, purely to omit the surrounding parens.

    So I agree that the parse tree simplification is an improvement, but I'm wondering if it might make sense to add an "is_subscript" attribute to expression nodes.

    That way the Tuple vs ExtSlice and Expr vs Index distinctions would be preserved, but the representation of those distinctions would change in a way that meant that most consuming code didn't need to care that the distinctions existed (ExtSlice would become a Tuple with is_subscript set to True, and similarly, Index would become Expr instances with is_subscript set to True).

    It's been so long since I changed the AST, though, that I'm not sure how painful adding that attribute would be.

    @serhiy-storchaka
    Copy link
    Member Author

    It was added to produce nicer output.

    Currently:

    >> print(ast.unparse(ast.parse('a[i, j]')))

    a[(i, j)]

    With PR 9605:

    >> print(ast.unparse(ast.parse('a[i, j]')))

    a[i, j]

    The current code is not consistent with outputting parenthesis:

    >> print(ast.unparse(ast.parse('a[i:j, k]')))

    a[i:j, k]

    It also produces the same output for a[i:j] and a[i:j,] which have different AST and compiled to different bytecode (this is a bug).

    >> print(ast.unparse(ast.parse('a[i:j,]')))

    a[i:j]

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Mar 7, 2020

    Agree with the idea, but think the name is too narrow. How about parethesized? There are many contexts where parentheses look weird and can be omitted (e.g. after return statement), although subscripts are currently the only place where they are an outright error.

    @isidentical
    Copy link
    Sponsor Member

    Yes, there is an already PR about that the bug.

    Related PR: #17892

    @serhiy-storchaka
    Copy link
    Member Author

    Sorry, I did not know about your PR Batuhan and fixed this bug in bpo-39889.

    After fixing the bug in the current code PR 9605 only simplifies the code. So in this case the new code is not more complex than correctly written old code.

    @isidentical
    Copy link
    Sponsor Member

    Sorry, I did not know about your PR Batuhan and fixed this bug in bpo-39889.

    No problem.

    @gvanrossum
    Copy link
    Member

    I'm going to review the actual code next.

    Regarding the omission of parentheses in various contexts, I am all for that, but I consider it a separate issue (as it only pertains to ast.unparse()). The fix in #17892 should go in regardless.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 13d52c2 by Serhiy Storchaka in branch 'master':
    bpo-34822: Simplify AST for subscription. (GH-9605)
    13d52c2

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants