Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast Str type does not annotate the string type when it parses a python document #70073

Closed
myronww mannequin opened this issue Dec 16, 2015 · 14 comments
Closed

ast Str type does not annotate the string type when it parses a python document #70073

myronww mannequin opened this issue Dec 16, 2015 · 14 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@myronww
Copy link
Mannequin

myronww mannequin commented Dec 16, 2015

BPO 25885
Nosy @brettcannon, @birkenfeld, @ncoghlan, @vstinner, @ericvsmith, @benjaminp, @bitdancer, @@myronww

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-12-17.15:17:40.263>
created_at = <Date 2015-12-16.21:09:07.331>
labels = ['interpreter-core', 'type-bug', 'invalid']
title = 'ast Str type does not annotate the string type when it parses a python document'
updated_at = <Date 2015-12-17.15:17:40.230>
user = 'https://github.com/myronww'

bugs.python.org fields:

activity = <Date 2015-12-17.15:17:40.230>
actor = 'eric.smith'
assignee = 'none'
closed = True
closed_date = <Date 2015-12-17.15:17:40.263>
closer = 'eric.smith'
components = ['Interpreter Core']
creation = <Date 2015-12-16.21:09:07.331>
creator = 'myronww'
dependencies = []
files = []
hgrepos = []
issue_num = 25885
keywords = []
message_count = 14.0
messages = ['256529', '256530', '256531', '256532', '256538', '256544', '256545', '256546', '256557', '256558', '256562', '256592', '256593', '256607']
nosy_count = 8.0
nosy_names = ['brett.cannon', 'georg.brandl', 'ncoghlan', 'vstinner', 'eric.smith', 'benjamin.peterson', 'r.david.murray', 'myronww']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue25885'
versions = ['Python 3.6']

@myronww
Copy link
Mannequin Author

myronww mannequin commented Dec 16, 2015

The 'ast' module does not indicate the type of string, ''' or '"' or '"""', that it has encountered when it parses a python document.

This prevents accurate reproduction of the original parsed document by a writer walking over an instance of a abstract syntax tree.

@myronww myronww mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Dec 16, 2015
@SilentGhost SilentGhost mannequin added the stdlib Python modules in the Lib dir label Dec 16, 2015
@brettcannon
Copy link
Member

Two things. One, this won't change in Python 2.7 due to compatibility, so this only applies to Python 3.6. Two, the AST has not been designed to support round-trip syntax transpiling, e.g. there is no way to get back comments in the source code.

Because the AST is not meant to round-trip on source code I'm going to close this as "not a bug" since I don't see any benefit in supporting this even if we gain comment nodes in the AST.

@bitdancer
Copy link
Member

Unlike the tokenizer, I don't think the AST module makes any guarantee about being able to reproduce the original source. This might be a reasonable enhancement request, though.

@bitdancer
Copy link
Member

Oh, that was weird. I got a weird error message from roundup. Must have been because I was posting at the same time as Brett. I'll defer to his decision on this.

@myronww
Copy link
Mannequin Author

myronww mannequin commented Dec 16, 2015

The purpose of a syntax tree is to represent the syntax and not a final processed result of processing of syntax. The current information stored for strings is losing syntax information which seems to defeat the purpose of offering the information in a syntax tree. I filed a separate bug because it is also combining strings and losing operators for string literals.

"Hello" + " World"

From the looks of the code, the above would result in one string type with "Hello World" and syntax information associated with the operator would be lost.

And as indicated, string type information is being lost as well. The user of the AST then has no way of getting the lost syntax information back once it is lost.

@myronww myronww mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir labels Dec 16, 2015
@myronww
Copy link
Mannequin Author

myronww mannequin commented Dec 16, 2015

I am re-opening this as I believe this is an important issue for work I would like to eventually push into the python core which is python code that recode themselves as declarations or as instance representation.

"I don't see any benefit in supporting this even if we gain comment nodes in the AST."

There would be a huge benefit to having accurate original syntax information of the document and being able to re-write the documents as it would enable self modifying code or software written code paradigms.

TestCases that can update themselves in various modes, full auto, interactive, etc.

TestData Pattern Generation object or Data Pattern Generation object that can be easily or efficiently modified and then asked to write themselves back out to files. 

The fact that there are other projects being worked on to provide round trip document to syntax tree and syntax tree to document transformations also indicates that the lack of accurate syntax storage in 'ast' is a problem that
needs to be addressed.

I would prefer to work with the core python modules in order to provide dynamic code modification functionality rather than relying on a third party module as eventually I would like to push this into core python.

If a python object has script behind it, I would like to eventually be able to tell a the object to write itself to a file as a object declaration or as a object instance.

As Declaration:

class SomeObject(SomeBase):
   def some method(self):
       print "blah blah"

As Instance:

   SomeObject()

@myronww myronww mannequin reopened this Dec 16, 2015
@bitdancer
Copy link
Member

Please respect the decisions of the python core developers as to issue status (you can argue with us until we ask you to stop, but let us make the status change if you convince us :)

We aren't rejecting your ideas, just this bug report (as not being a bug). To pursue your ideas you will want to subscribe to the python-ideas mailing list and begin a discussion there. This is far too big a topic for the bug tracker.

@vstinner
Copy link
Member

"There would be a huge benefit to having accurate original syntax information of the document and being able to re-write the documents as it would enable self modifying code or software written code paradigms."

You misunderstood the purpose of the AST tree. "Unparse" AST to retrieve the original source code is out of the scope of the Python AST structure and it's not going to happen. For that, you must use other tool. See for example RedBaron, as I said at:

http://bugs.python.org/issue25886#msg256543

Being able to "unparse" AST to regenerate the Python original source code requires complex operation. You have to store all formating information (spaces, new lines, etc.). You also have to store comments. It's just to give a few examples.

If AST contains formating information, it would be more complex to handle AST.

@myronww
Copy link
Mannequin Author

myronww mannequin commented Dec 16, 2015

Why is unparsing out of the scope of the base syntax tree of a dynamic language such as python?

Is that just a personal opinion? Would adding proper storage of syntax information in the AST cause performance issues?

Is there some documentation that defines the scope of the AST module that indicates this is out of scope.

The syntax for numerical constants is properly stored by the AST module.

Example:
TEST_INT = 1 + 1

Why would the syntax of integer constants be more important than the syntax associated with comments, two string literals, or from multi-line strings.

@myronww myronww mannequin reopened this Dec 16, 2015
@vstinner
Copy link
Member

"Would adding proper storage of syntax information in the AST cause performance issues?"

Exactly. And storing syntax information is useless for 90% of usages of AST. So Python is optimized for the most common cases.

Again, use a different project (like RedBaron) if you need *all* information in the AST.

@myronww
Copy link
Mannequin Author

myronww mannequin commented Dec 16, 2015

Would it be prudent to add a Parse flag to the AST module that could provide one of two types of AST's an optimized AST or a complete AST

@vstinner
Copy link
Member

Would it be prudent to add a Parse flag to the AST module that could provide one of two types of AST's an optimized AST or a complete AST

Adding syntax ("formatting", call it as you want) info to AST requires
to add new attributes to existing AST nodes and probably add new AST
nodes. It will make the code more complex, not only the code to
produce AST, but also code using AST (static code analyzer, my AST
optimizer project, etc.).

I don't think that it's worth it.

@vstinner
Copy link
Member

If you consider that we are wrong, please follow the advice of
starting a discussion on python-ideas. This is how Python is
developed, we have a workflow. Proposing ideas on the bug tracker
works in some cases, but AST is really a *core* feature of Python. You
need deep discussions to change the core. To be cristal clear: *if*
anyone is going to change AST, IMHO a PEP is needed. Writing a PEP
requires a specific workflow starting at python-ideas.

@ericvsmith
Copy link
Member

I agree that the proposed change would require a PEP, and this should be discussed on python-ideas.

I also think there's very little chance such a change would be accepted, but that doesn't mean it's impossible.

I think using a external library is your best bet here. If you want to pursue this on python-ideas, you should discuss why being in the stdlib's ast module would be an improvement over using an external library.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants