Issue41746
Created on 2020-09-08 20:31 by pablogsal, last changed 2020-10-21 19:55 by lys.nikolaou. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 22223 | merged | pablogsal, 2020-09-13 02:43 | |
PR 22786 | closed | lys.nikolaou, 2020-10-19 21:51 | |
PR 22864 | merged | lys.nikolaou, 2020-10-21 19:03 |
Messages (21) | |||
---|---|---|---|
msg376592 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 20:31 | |
Casting incorrect elements pulled from asdl_seq objects has been a pain when developing the new PEG parser and is a source of spooky-bug-at-a-distance problems in which the consequences of an incorrect casting from void* are felt much later. I propose to add a new field to asdl_seq objects, which is an enumeration of the possible types it can contain. The main ideas are: * By default, the enumeration will have UNDEFINED type. * We can add some extra macros mirroring asdl_seq_GET and asdl_seq_SET that will receive the expected type and in debug mode will assert that the type is correct. Something like: expr_ty item = asdl_set_GET_TYPED(sequence, n, EXPR); * Usage of asdl_seq_GET and asdl_seq_SET do not do extra checks. * To set the type information, we can add a new constructor: asdl_seq_new_typed(size, arena, TYPE); I think this change is worth because is not very invasive (old usage remains the same), we can slowly migrate only the parts that we need/want and will add some extra debugging possibilities for cases that has been quite challenging. |
|||
msg376595 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-09-08 20:45 | |
Let me know if I'm misunderstanding the proposal. Are you proposing a non-standard ASDL extension to that won't work with existing ASDL parsers? I thought the entire point of ASDL is that it was standard, portable, and had a fixed number of types easily implemented in many languages. FWIW, Here are some of my notes from a compiler project I worked on a while back: # ASDL's seven builtin types are: # - identifier, int, string, bytes, object, singleton, constant # - singleton: None, True or False # - constant can be None, whereas None means "no value" for object # # Resources for ASDL: # https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf # https://eli.thegreenplace.net/2014/06/04/using-asdl-to-describe-asts-in-compilers # https://www.oilshell.org/blog/2016/12/11.html |
|||
msg376596 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-09-08 20:49 | |
Within standard ASDL, the sequences already have an ASDL type. Here's a snippet that shows both builtin types and derived types: start = Program(stmt* procs, expr* calls) stmt = Procedure(identifier name, identifier* params, bool is_test, stmt body) | Block(int blocknum, stmt* stmts) | Assign(lvalue target, expr value) | If(expr cond, stmt action) | Loop(int times, bool fixed, body stmt) | AbortLoop(int blocknum) | QuitBlock(int blocknum) expr = BinOp(expr value1, str op, expr value2) | Number(int x) | Bool(bool x) | Id(identifier name) | Call(identifier name, expr* args) | lvalue lvalue = Output(bool is_bool) | Cell(int i) |
|||
msg376597 - (view) | Author: Batuhan Taskaya (BTaskaya) * ![]() |
Date: 2020-09-08 20:51 | |
We already have a specialized type for int (actually an enumeration); typedef struct { Py_ssize_t size; void *elements[1]; } asdl_seq; typedef struct { Py_ssize_t size; int elements[1]; } asdl_int_seq; Why not just include these in the Python-ast.h and auto-generate them in the asdl_c.py |
|||
msg376598 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 20:52 | |
> Let me know if I'm misunderstanding the proposal. Are you proposing a non-standard ASDL extension to that won't work with existing ASDL parsers? No, I am not proposing any modification to the ASDL parsers nor the ASDL definition language. I am proposing a modification to the internal data structures we use internally to carry sequences of ASDL types. Currently, asdl_seq is a type that holds an array of void* pointers and the user is supposed to cast the individual items to the correct types, which is error prone. I am proposing to add a new field to our internal structures that allows the user to have some certainty over what these sequences contain. |
|||
msg376599 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 20:53 | |
> Why not just include these in the Python-ast.h and auto-generate them in the asdl_c.py That's a possibility, but that would incur in a lot of new extra code, and some asdl_seq items in the parser contain non-standard types that only PEG uses, so is not possible to auto-generate them. |
|||
msg376600 - (view) | Author: Batuhan Taskaya (BTaskaya) * ![]() |
Date: 2020-09-08 20:53 | |
The problem with asdl_seq_int was that it was hard-coded since there was no entity that transpods data between different traversers / node visitors. PR 20193 tries to implement a general-purpose metadata system to eliminate that issue, and can be easily extended for this too! |
|||
msg376601 - (view) | Author: Batuhan Taskaya (BTaskaya) * ![]() |
Date: 2020-09-08 20:54 | |
> That's a possibility, but that would incur in a lot of new extra code, and some asdl_seq items in the parser contain non-standard types that only PEG uses, so is not possible to auto-generate them. Well, since the extra code will be autogenerated, guess it wouldn't be much problematic. But I see about the PEG thing. |
|||
msg376602 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 20:57 | |
> Well, since the extra code will be autogenerated, guess it wouldn't be much problematic. It will because it means that generic functions that receive asdl_seq an need to cast aliased pointers because they will still receive asdl_seq* items and you now need to cars them to the appropriate type: int foo(asdl_seq* my_seq) { asdl_seq_expr* = (asdl_seq_expr*)my_seq } So you will end up with what I am proposing: a flag in asdl_seq that tells you of what type it is. |
|||
msg376603 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-09-08 21:04 | |
> No, I am not proposing any modification to the ASDL parsers nor > the ASDL definition language. I am proposing a modification to > the internal data structures we use internally to carry sequences > of ASDL types. Thanks for the clarification. Carry on :-) |
|||
msg376604 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 21:15 | |
> Well, since the extra code will be autogenerated, guess it wouldn't be much problematic. But I see about the PEG thing. On the other hand, having explicit types for the sequences will allow us to get compile errors instead of runtime errors for the standard types, so this may be a good possibility. Maybe some mixture of both: * asdl_seq objects can contain an enum describing what they are. * We have specific specializations for adding to ASDL types. |
|||
msg376606 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2020-09-08 21:53 | |
I know it's moot now, but still -- what benefit do we get from using a "standard" like ASDL? All our tooling for it is custom for Python only. |
|||
msg376608 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-08 22:04 | |
> I know it's moot now, but still -- what benefit do we get from using a "standard" like ASDL? All our tooling for it is custom for Python only. I think there are other tools or implementations of Python that use standard parsers for our ASDL file. In any case, AFAIU we don't need to modify the current ASDL definition as it already has type information that we need. For instance: | Block(int blocknum, stmt* stmts) that says that stmts is a sequence of stmt objects, but unfortunately, this is translated as an "asdl_seq" type, losing the information that it can only contain stmt types. This is what I am planning to address here. |
|||
msg376615 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-09-09 01:43 | |
> I know it's moot now, but still -- what benefit do we get > from using a "standard" like ASDL? The "standard" part of it isn't important. AFAICT, ASDL has a low adoption rate and is not maintained. IMO, the part that matters is that ADSL was carefully balanced to be sufficiently expressive while keeping it easy to implement and easy to automatically translate into different languages. Presumably, this will not only help other Python implementations and third-party tooling, it will also make life easier for us in the long run. My understanding of the origin of ASDL is that it aspired to solve a common problem in language design where people commonly described their abstract syntax in way that was too tightly bound to underlying implementation language. This caused long-run problems when reimplementing in other languages and when trying to automatically update downstream tools that interoperate with the AST. In this regard, my personal experience with ASDL has been favorable. I view it as the JSON spec of the AST world, intentionally minimal yet expressive. That said, I think it failed to establish itself as a standard and almost no tooling was created for it. The original authors expected that ASDL would sit side-by-side with BNF and regex notation. That was a pipe dream. |
|||
msg376617 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2020-09-09 03:40 | |
I guess some of my gripes about ASDL have to do with our tooling. For example I find it annoying that the 'kind' enums overlap, so if I have a void* that I know points to an AST node I can't look at the kind to tell what it is. Anyway, regarding Pablo's proposal, I think that if we do it, we should do a full migration, not a piecemeal one. I think the vast majority of asdl_seq_GET calls are immediately preceded by a cast anyway, and the majority of asdl_seq_SET calls are in generated code. |
|||
msg376619 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-09-09 05:55 | |
That all makes sense (including the full migration). |
|||
msg377010 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-09-16 18:42 | |
New changeset a5634c406767ef694df49b624adf9cfa6c0d9064 by Pablo Galindo in branch 'master': bpo-41746: Add type information to asdl_seq objects (GH-22223) https://github.com/python/cpython/commit/a5634c406767ef694df49b624adf9cfa6c0d9064 |
|||
msg378962 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * ![]() |
Date: 2020-10-19 17:06 | |
Pablo, can this be closed now that PR22223 is merged? |
|||
msg379001 - (view) | Author: Pablo Galindo Salgado (pablogsal) * ![]() |
Date: 2020-10-19 19:45 | |
> Pablo, can this be closed now that PR22223 is merged? Nop, we still need to fix the CHECK macros that do type erasure (they cast the result to void). |
|||
msg379239 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * ![]() |
Date: 2020-10-21 19:53 | |
New changeset 2e5ca9e3f68b33abb7d2c66d22ffc18dec40641a by Lysandros Nikolaou in branch 'master': bpo-41746: Cast to typed seqs in CHECK macros to avoid type erasure (GH-22864) https://github.com/python/cpython/commit/2e5ca9e3f68b33abb7d2c66d22ffc18dec40641a |
|||
msg379240 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * ![]() |
Date: 2020-10-21 19:55 | |
Last bit of work is now done. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2020-10-21 19:55:02 | lys.nikolaou | set | status: open -> closed resolution: fixed messages: + msg379240 stage: patch review -> resolved |
2020-10-21 19:53:22 | lys.nikolaou | set | messages: + msg379239 |
2020-10-21 19:03:08 | lys.nikolaou | set | pull_requests: + pull_request21806 |
2020-10-19 21:51:20 | lys.nikolaou | set | pull_requests: + pull_request21743 |
2020-10-19 19:45:38 | pablogsal | set | messages: + msg379001 |
2020-10-19 17:06:43 | lys.nikolaou | set | messages: + msg378962 |
2020-09-16 18:42:07 | pablogsal | set | messages: + msg377010 |
2020-09-13 02:43:25 | pablogsal | set | keywords:
+ patch stage: patch review pull_requests: + pull_request21278 |
2020-09-09 05:55:04 | rhettinger | set | messages: + msg376619 |
2020-09-09 03:40:12 | gvanrossum | set | messages: + msg376617 |
2020-09-09 01:43:01 | rhettinger | set | messages: + msg376615 |
2020-09-08 22:04:09 | pablogsal | set | messages: + msg376608 |
2020-09-08 21:53:02 | gvanrossum | set | messages: + msg376606 |
2020-09-08 21:15:34 | pablogsal | set | messages: + msg376604 |
2020-09-08 21:04:39 | rhettinger | set | messages: + msg376603 |
2020-09-08 20:57:07 | pablogsal | set | messages: + msg376602 |
2020-09-08 20:54:32 | BTaskaya | set | messages: + msg376601 |
2020-09-08 20:53:25 | BTaskaya | set | messages: + msg376600 |
2020-09-08 20:53:11 | pablogsal | set | messages: + msg376599 |
2020-09-08 20:52:02 | pablogsal | set | messages: + msg376598 |
2020-09-08 20:51:28 | BTaskaya | set | messages: + msg376597 |
2020-09-08 20:49:03 | rhettinger | set | messages: + msg376596 |
2020-09-08 20:46:23 | BTaskaya | set | nosy:
+ BTaskaya |
2020-09-08 20:45:50 | rhettinger | set | nosy:
+ rhettinger messages: + msg376595 |
2020-09-08 20:31:17 | pablogsal | create |