Message 383681 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	BTaskaya
Recipients	BTaskaya, pablogsal, pfalcon, serhiy.storchaka
Date	2020-12-24.10:34:15
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1608806055.68.0.139746238425.issue42729@roundup.psfhosted.org>
In-reply-to

Content
> I propose to close that gap, and establish an API which would allow to parse token stream (iterable) into an AST. An initial implementation for CPython can (and likely should) be naive, making a loop thru surface program representation. There is different aspects of this problem (like maintenance cost of either exposing the underlying tokenizer, or building something like Python-ast.c to convert these 2 different token types back and forth which I'm big -1 on both of them.) but the thing I don't quite get is the use case. What prevents you from using ast.parse(tokenize.untokenize(token_stream))? It is guaranteed that you won't miss anything (in terms of the position of tokens) (since it almost roundtrips every case). Also, tokens -> AST is not the only disconnected part in the underlying compiler. Stuff like AST -> Symbol Table / AST -> Optimized AST etc. is also not available, and apparently not needed (since nobody else, maybe except me [about the AST -> ST conversion], complained about these being missing). I'd also suggest moving the discussion to the Python-ideas, for a much greater audience.

> I propose to close that gap, and establish an API which would allow to parse token stream (iterable) into an AST. An initial implementation for CPython can (and likely should) be naive, making a loop thru surface program representation. 

There is different aspects of this problem (like maintenance cost of either exposing the underlying tokenizer, or building something like Python-ast.c to convert these 2 different token types back and forth which I'm big -1 on both of them.) but the thing I don't quite get is the use case. 

What prevents you from using ast.parse(tokenize.untokenize(token_stream))? It is guaranteed that you won't miss anything (in terms of the position of tokens) (since it almost roundtrips every case). 

Also, tokens -> AST is not the only disconnected part in the underlying compiler. Stuff like AST -> Symbol Table / AST -> Optimized AST etc. is also not available, and apparently not needed (since nobody else, maybe except me [about the AST -> ST conversion], complained about these being missing). 

I'd also suggest moving the discussion to the Python-ideas, for a much greater audience.

History
Date	User	Action	Args
2020-12-24 10:34:15	BTaskaya	set	recipients: + BTaskaya, pfalcon, serhiy.storchaka, pablogsal
2020-12-24 10:34:15	BTaskaya	set	messageid: <1608806055.68.0.139746238425.issue42729@roundup.psfhosted.org>
2020-12-24 10:34:15	BTaskaya	link	issue42729 messages
2020-12-24 10:34:15	BTaskaya	create