Message 326570 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	benjamin.peterson, brett.cannon, gvanrossum, nascheme, ncoghlan, serhiy.storchaka, thautwarm, vstinner, yselivanov
Date	2018-09-27.16:17:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538065060.28.0.545547206417.issue34822@psf.upfronthosting.co.za>
In-reply-to

Content
Currently an AST for expressions can have non-terminal nodes of two types: expr and slice. Every of them can be a one of several kinds. A slice can be of kind Index (just an expression), Slice (optional expressions separated by ":") or ExtSlice (slices separated by ","). For example, the expression "d[a, ..., b:c]" is represented as: Subscript( Name('d', Load()), ExtSlice( [ Index(Name('a', Load())), Index(Constant(Ellipsis)), Slice(Name('b', Load()), Name('c', Load()), None) ] ), Load() ) and the expression "d[a, ..., b]" is represented as: Subscript( Name('d', Load()), Index( Tuple( [ Name('a', Load()), Constant(Ellipsis), Name('b', Load()) ], Load() ) ), Load() ) (note that ExtSlice is used only if there are slices in subexpressions). I suggest to get rid of the separate slice type. The Slice kind will be a kind of the expr type instead of the slice type. The ExtSlice kind will be always replaced with a Tuple, even if subexpressions contain slices. Nodes of the Index kind will be replaced with expr nodes to which they are referred. For example, the expression "d[a, ..., b:c]" will be represented as: Subscript( Name('d', Load()), Tuple( [ Name('a', Load()), Constant(Ellipsis), Slice(Name('b', Load()), Name('c', Load()), None) ], Load() ), Load() ) This will simplify the code for handling AST, especially the C code. The following PR removes around 400 lines of code (a half of them are generated, but others are handwritten). I hope that this regularization will help to write a general code for walking AST for expressions and remove more duplicated code in ast_opt.c, ast_unparse.c, and symtable.c. This even can help to solve a problem with crashes in handling too deep AST if implement the recursion without using the C stack (but this is dreams). This change is more breaking than a change in issue32892. What will continue to work: * The code for creating an AST when pass values as arguments: `Index(value)` will return just `value`, `ExtSlice(slices)` will return `Tuple(slices, Load())`. * NodeVisitor based processing. Methods visit_Index() and visit_ExtSlice() will be never invoked. The semantic of visit_Slice() will be not changed. visit_Tuple() will be invoked instead of visit_ExtSlice() for extended slices. * isinstance() and issubclass() checks for Slice. Subclassing of Slice. What will no longer work (with the current implementation): * The code that creates empty AST nodes and sets their attributes. `node = Index(); node.value = value` will no longer work. * The code that reads attributes of just created Index and ExtSlice nodes. `Index(value)` will return just `value` instead of a new object whose attribute "value" is a specified value. A list of subexpressions of ExtSlice(slices) will be accessible as the "elts" attribute instead of "dims" (because it is a Tuple). * isinstance() and issubclass() checks for Index and ExtSlice will always return False. * Subclassing of Index and ExtSlice. Instantiating subclasses of Index and ExtSlice will return the same result as for Index and ExtSlice, i.e. not instance of these classes. * The code that produces a Python code from AST will need to handle indexing with tuples specially (see Tools/parser/unparse.py) because d[(a, b)] is valid syntax (although parenthesis are redundant), but d[(a, b:c)] is not. Some limitations are caused by the desire for simplicity and can be removed. For example it is possible to add the "dims" alias of "elts" to Tuple, and make subclasses working as before. It is more hard and inefficient to keep isinstance() checks and attribute access for Index. If some breakage for Index is not avoidable, I'm not sure that it is worth to spent efforts for imitating the old behavior for ExtSlice.

Currently an AST for expressions can have non-terminal nodes of two types: expr and slice. Every of them can be a one of several kinds. A slice can be of kind Index (just an expression), Slice (optional expressions separated by ":") or ExtSlice (slices separated by ",").

For example, the expression "d[a, ..., b:c]" is represented as:

    Subscript(
        Name('d', Load()),
        ExtSlice(
            [
                Index(Name('a', Load())),
                Index(Constant(Ellipsis)),
                Slice(Name('b', Load()), Name('c', Load()), None)
            ]
        ),
        Load()
    )

and the expression "d[a, ..., b]" is represented as:

    Subscript(
        Name('d', Load()),
        Index(
            Tuple(
                [
                    Name('a', Load()),
                    Constant(Ellipsis),
                    Name('b', Load())
                ],
                Load()
            )
        ),
        Load()
    )

(note that ExtSlice is used only if there are slices in subexpressions).

I suggest to get rid of the separate slice type. The Slice kind will be a kind of the expr type instead of the slice type. The ExtSlice kind will be always replaced with a Tuple, even if subexpressions contain slices. Nodes of the Index kind will be replaced with expr nodes to which they are referred. For example, the expression "d[a, ..., b:c]" will be represented as:

    Subscript(
        Name('d', Load()),
        Tuple(
            [
                Name('a', Load()),
                Constant(Ellipsis),
                Slice(Name('b', Load()), Name('c', Load()), None)
            ],
            Load()
        ),
        Load()
    )

This will simplify the code for handling AST, especially the C code. The following PR removes around 400 lines of code (a half of them are generated, but others are handwritten). I hope that this regularization will help to write a general code for walking AST for expressions and remove more duplicated code in ast_opt.c, ast_unparse.c, and symtable.c. This even can help to solve a problem with crashes in handling too deep AST if implement the recursion without using the C stack (but this is dreams).

This change is more breaking than a change in issue32892. What will continue to work:

* The code for creating an AST when pass values as arguments: `Index(value)` will return just `value`, `ExtSlice(slices)` will return `Tuple(slices, Load())`.

* NodeVisitor based processing. Methods visit_Index() and visit_ExtSlice() will be never invoked. The semantic of visit_Slice() will be not changed. visit_Tuple() will be invoked instead of visit_ExtSlice() for extended slices.

* isinstance() and issubclass() checks for Slice. Subclassing of Slice.

What will no longer work (with the current implementation):

* The code that creates empty AST nodes and sets their attributes. `node = Index(); node.value = value` will no longer work.

* The code that reads attributes of just created Index and ExtSlice nodes. `Index(value)` will return just `value` instead of a new object whose attribute "value" is a specified value. A list of subexpressions of ExtSlice(slices) will be accessible as the "elts" attribute instead of "dims" (because it is a Tuple).

* isinstance() and issubclass() checks for Index and ExtSlice will always return False.

* Subclassing of Index and ExtSlice. Instantiating subclasses of Index and ExtSlice will return the same result as for Index and ExtSlice, i.e. not instance of these classes.

* The code that produces a Python code from AST will need to handle indexing with tuples specially (see Tools/parser/unparse.py) because d[(a, b)] is valid syntax (although parenthesis are redundant), but d[(a, b:c)] is not.

Some limitations are caused by the desire for simplicity and can be removed. For example it is possible to add the "dims" alias of "elts" to Tuple, and make subclasses working as before. It is more hard and inefficient to keep isinstance() checks and attribute access for Index. If some breakage for Index is not avoidable, I'm not sure that it is worth to spent efforts for imitating the old behavior for ExtSlice.

History
Date	User	Action	Args
2018-09-27 16:17:40	serhiy.storchaka	set	recipients: + serhiy.storchaka, gvanrossum, brett.cannon, nascheme, ncoghlan, vstinner, benjamin.peterson, yselivanov, thautwarm
2018-09-27 16:17:40	serhiy.storchaka	set	messageid: <1538065060.28.0.545547206417.issue34822@psf.upfronthosting.co.za>
2018-09-27 16:17:40	serhiy.storchaka	link	issue34822 messages
2018-09-27 16:17:40	serhiy.storchaka	create