classification
Title: pyclbr rewrite using AST
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Batuhan Taskaya, brandtbucher, pablogsal, terry.reedy
Priority: normal Keywords: patch

Created on 2020-01-21 13:29 by Batuhan Taskaya, last changed 2020-02-13 22:48 by brandtbucher.

Pull Requests
URL Status Linked Edit
PR 18103 open BTaskaya, 2020-01-21 13:40
Messages (4)
msg360397 - (view) Author: Batuhan Taskaya (Batuhan Taskaya) Date: 2020-01-21 13:29
pyclbr currently uses token streams to analyze but it can be alot simpler with usage of AST. There are already many flaws, including some comments about limitations of this token stream processing. 

I have a draft about this. Initial PR wont change any behavior, it will just make code much simpler with the usage of AST (just an addition to Function about handling of async functions, is_async field). If agreed I can propose a second PR (or append the inital one) that will enhance Function/Class objects with various identifiers (like keywords, metaclasses, end position information etc.). The second PR will be alot easier to do thanks to AST.
msg360440 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-22 01:19
Nice.  A few notes.

Docs: The initial doc paragraph now has 'module browser' rather than 'class browser'.  The change is needed 2 other places.  Could be part of this PR or another one.
* Chapter title: "Python class browser support".  The only instance of 'class browser' in the chapter.
* Module index: "Supports information extraction for a Python class browser."

Possible API changes based on IDLE module browser experience:

1. IDLE filters out the import nodes.  It would be nicer to not have to delete them by passing something like 'imports=False' (with True being the default for back compatibility).

2. With nested classes and functions added, the returned object is almost a tree of nodes, except that what should be the top 'Module' node is instead what should be its children dict.  This un-uniformity gets in the way of recursively processing the tree.

3. If there were a function that returned a Module node with its path and name, the same could be removed from all children except for new ImpClass nodes representing imported classes when imports=True.  Since users know the module passed to pyclbr, this might be true even without Module nodes.

4. If a user immediately inserts nodes into a tree widget, then the user may only need a iterator of minimal nodes (name, parent, startline).  This is enough when using tkinter.ttk.Treeview.   I have thought of doing this for IDLE by either simplifying a copy of the pyclbr code or traversing the AST, as the PR here does.  This would let top-level items appear in the widget as they are scanned instead of waiting until scanning is complete.
msg360458 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python triager) Date: 2020-01-22 11:56
Thanks for the suggestions Terry, I didn't want to change any behavior or make any feature addition (except is_async as stated in issue description) in this PR. 

> The change is needed 2 other places.  Could be part of this PR or another one.

I have a friend who would like to make his first contribution, so if it is not going to be problem, he'll take care of class browser references.

> Possible API changes based on IDLE module browser experience

Yes, it would be way better use tree/node structre instead of returning raw dictionaries. I can draft something after PR 1803 is merged. 

I dont have much experience of IDLE's architecture, but I can try to help if you want on the 4th point. 


Another thing I was considering is deprecating `readmodule` function, as you stated it is now module browser not class browser and `readmodule_ex` is sufficient. I dont know if any core developer will sponsor this idea, but I think `readmodule` should die. Thanks again for the suggestions and PR reviews.
msg360506 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-22 21:06
A separate doc change issue and PR would be fine.  Should we add a note explaining the module name as a contraction of an originally restricted scope?  Make me nosy and invite review.

Actually, a doc issue for the module as is should *fully* explain readmodule_ex first (its entry is now incomplete), and then explain readmodule as a filtered version kept for back compatibility.  This could be a separate PR on the same issue, written by one of us, if too much for your friend.

I understood limits of this PR. I should have said change notes were intended for your 'second PR'. 

Changing the return value to a Module should mean a third function, which would then become the main function, as readmodule_ex would then be Module.children.

I have thought about making it possible to browse non-source modules, at least for the module being browsed.  I might implement that first in IDLE.
History
Date User Action Args
2020-02-13 22:48:40brandtbuchersetnosy: + brandtbucher
2020-01-22 21:06:45terry.reedysetmessages: + msg360506
2020-01-22 11:56:40BTaskayasetnosy: + BTaskaya
messages: + msg360458
2020-01-22 01:19:48terry.reedysetnosy: + terry.reedy

messages: + msg360440
title: pyclbr rewrite on AST -> pyclbr rewrite using AST
2020-01-21 14:47:02Batuhan Taskayasettype: enhancement
2020-01-21 13:40:05BTaskayasetkeywords: + patch
stage: patch review
pull_requests: + pull_request17491
2020-01-21 13:29:29Batuhan Taskayacreate