classification
Title: inspect.getsource returns incorrect source for classes when class definition is part of multiline strings
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Windson Yang, donovick, pablogsal, serhiy.storchaka, xtreak, yselivanov
Priority: normal Keywords: patch

Created on 2018-10-30 12:32 by xtreak, last changed 2020-04-29 03:59 by xtreak. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 10307 merged xtreak, 2018-11-03 11:29
PR 19587 closed serhiy.storchaka, 2020-04-18 18:18
Messages (14)
msg328909 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-30 12:32
inspect.getsource uses inspect.findsource that uses a regex to check for class declaration and starts matching against the regex from the start of the program. When there is a match it checks for the first character to be 'c' to return the line number from which getblock is used to get the class definition. If there are multiple matches and there is no line with 'c' as the first character in the line then they are sorted based on the number of whitespaces preceding class keyword. This poses the following problem :

1. When a class definition like string occurs as part of a multiline string before the actual definition then this causes the function to return the multiline string instead of actual definition. This occurs both while using the multiline string also as comment and also using them in variable definition in other places.

2. When the class is defined inside another class and there is a similar multiline string inside another class which is indented along the same indentation level of the class then they are sorted by whitespace where they are equal and then followed by line number. Since the class definition occurs after the multiline string it causes the multiline string to be taken up as the source of the class.

This was last changed in 89f507fe8c4 (Dec 2006) which is also a merge commit. I searched for issues and relevant test cases but I couldn't find any in the test suite or issue tracker regarding the same. Hence I am filing a new issue.

# backups/bpo35101.py

import inspect

class Bar:
    a = """
class MultilineStringVariable:
    ...
"""

class MultilineStringVariable:

    def foo(self):
        pass

'''
class MultilineStringComment:
    pass
'''

class MultilineStringComment:

    def foo(self):
        pass

class NestedClass:
    a = '''
    class Spam:
        ...
    '''

class Nested:

    class Spam:
        pass

print(inspect.getsource(MultilineStringVariable))
print(inspect.getsource(MultilineStringComment))
print(inspect.getsource(Nested.Spam))

# Incorrect results

$ ./python.exe ../backups/bpo35101.py
class MultilineStringVariable:
    ...

class MultilineStringComment:
    pass

    class Spam:
        ...
msg329097 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-11-02 02:17
Yes, the code located at https://github.com/python/cpython/blob/7cd25434164882c2093ea41ccfc7b95a05cd5cbd/Lib/inspect.py#L794 

I think to use a comment status var would be a solution, Do you have any idea to fix it?
msg329112 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-02 07:53
I think parsing with regex especially on multiline cases like here is little tricky handling all cases. I find AST module to be a good fit. I tried using a callback where every class definition is taken up and then from that line to rest of the program getblock can be used to get the relevant class definition. This takes care of the cases I have listed and returns the correct source where we can filter out the class we need. Hence findsource can be modified to use ast instead of regex given that regex was introduced in 2006 and changing class syntax in future might need updating this regex. A sample program I tried but this is the first time I am using ast module and there might be better ways to filter the required classes. I will try raising a PR after some discussion to see if any better solution exists and more tests.

# foo_ast.py

import ast
import inspect


class MyClassVisitor(ast.NodeVisitor):

    def __init__(self, lines, *args, **kwargs):
        self.lines = lines
        super().__init__(*args, **kwargs)

    def visit_ClassDef(self, node):
        print('Found class "%s"' % node.name)
        print("Source")
        print(''.join(inspect.getblock(self.lines[node.lineno-1:])))

with open('../backups/bpo35101_1.py') as f:
    source = f.read()
    lines = source.splitlines(True)
    tree = ast.parse(source)
    MyClassVisitor(lines).visit(tree)

$ ./python.exe ../backups/foo_ast.py

Found class "Bar"
Source
class Bar:
    a = """
class MultilineStringVariable:
    ...
"""

Found class "MultilineStringVariable"
Source
class MultilineStringVariable:

    def foo(self):
        pass

Found class "MultilineStringComment"
Source
class MultilineStringComment:

    def foo(self):
        pass

Found class "NestedClass"
Source
class NestedClass:
    a = '''
    class Spam:
        ...
    '''

Found class "Nested"
Source
class Nested:

    class Spam:
        pass
msg329180 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-03 05:59
I tried ast module and it passes my test cases but I found another case since object.__name__ returns unqualified name thus matching against nested class names might conflict with the ones on the top level. Example is below where there is Spam at the module level and NestedA.Spam which is defined inside another class. getsource(NestedA.Spam) returns the source for Spam which is also unexpected. It seems ast module also doesn't call visit_ClassDef on nested class definitions thus my approach also couldn't detect NestedA.Spam and returns source for Spam. I wish class objects also had some kind of code attribute similar to functions so that line number is attached. I don't know the internals so I might be wrong here. I am adding Yury.

import inspect

class Egg:

    def func(self):
        pass

class NestedA:

    class Egg:
        pass


print(inspect.getsource(NestedA.Egg))
print(inspect.getsource(Egg))

# Output

class Egg:

    def func(self):
        pass

class Egg:

    def func(self):
        pass
msg329181 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-03 06:08
Use __qualname__.
msg329182 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-03 06:25
Thanks Serhiy, object.__qualname__ can be used to get qualified name of the object but ast nodes don't have qualified name for the node where I can match against it. node.name returns unqualified name and hence using ast module I can't differentiate between Spam and NestedA.Spam since both return 'Spam' for node.name . I am also using a recursive decorator found at https://stackoverflow.com/a/14661325/2610955 which solves problem iterating through nested classes. But for matching module level class object and nested class level object when I parse a syntax tree for Spam and NestedA.Spam both nodes for class definition return 'Spam'. Is there a way to get qualified names in ast module or do I need to store it somewhere? My current approach for inspect.findsource is as below : 


class ClassVisitor(ast.NodeVisitor):

    def recursive(func):
        """ decorator to make visitor work recursive """
        def wrapper(self, node):
            func(self, node)
            for child in ast.iter_child_nodes(node):
                self.visit(child)
        return wrapper

    def __init__(self, source, name, *args, **kwargs):
        self.source = source
        self.line_number = None
        self.name = name
        super().__init__(*args, **kwargs)

    @recursive
    def visit_ClassDef(self, node):
        # Need to match qualified name to differentiate between Spam and NestedA.Spam
        if node.name == self.name:
            # decrement by one since lines list starts with indexing by zero
            self.line_number = node.lineno - 1

name = object.__name__ # can use object.__qualname__ but node.name is not qualified
source = ''.join(lines)
tree = ast.parse(source)
class_visitor = ClassVisitor(source, name)
class_visitor.visit(tree)

if class_visitor.line_number is not None:
    return lines, class_visitor.line_number
else:
    raise OSError('could not find class definition')
msg329183 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-03 06:50
Use a stack of names. Every time when you enter a class and function nodes, push the current name on the stack, and pop it out after handling child nodes. Compare the qualified name with '.'.join(self.stack). Don't use the recursive decorator, it is not suitable for this, because you need to execute some code before and after handling child nodes. Just add a loop for child nodes in your handler.

Yet one thing that you should take to account: decorators. In you should return the line number of the first decorator if they are used. This is easy.

This is an interesting issue. I would take it if you do not already working on it. But I think it should be interesting and useful to you. Good practice. Once you get some working code, create a PR. I will suggest next steps.
msg329187 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-03 07:00
Thanks Serhiy, I am going with the same approach too using a Stack. I am working on it and I have fixed the set initial cases I reported with tests. I just stumbled upon the nested definitions and inner classes. I will raise a PR by end of today with my progress so that more cases along with tests can be covered during PR review.
msg329202 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-03 18:12
Well, now the hardest problem remains.

Sometimes there are more than one place where the class with the same __qualname__ is defined. See for example multiprocessing/heap.py:

if sys.platform == 'win32':
    class Arena(object):
        ...
else:
    class Arena(object):
        ...

It is hard to determine the correct place. But the current code return the first line, while PR 10307 currently returns the last line.

If the current behavior is more desirable, the code needs to stop searching after finding the first candidate. And the simplest way is to use exceptions:

class ClassVisitor(ast.NodeVisitor):
    def visit_ClassDef(self, node):
        ...
        if found:
            raise StopIterator(line_number)
        ...
...
try:
    class_visitor.visit(tree)
except StopIterator as e:
    line_number = e.value
else:
    raise OSError('could not find class definition')
msg329203 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-11-03 18:53
Thanks Serhiy for the details. I think there are also less tests for inspect.findsource with respect to classes though it's more robust than the regex approach. Thus there might be different effects which are correct now or selects an alternate candidate as you have mentioned. I think this is worth adding a test case once the behavior is finalized. 

On a general note I am curious is there any patch to add line number directly to the class as an attribute? I think inspect module uses object.co_firstlineno where object is a code object and works for function object. Is there any technical difficulty in this? I searched the mailing list and PEPs but couldn't come across anything. Also there are couple of issues around showing source from REPL where it's not saved to the file issue33826, issue12920.

There is one more frame pointer issue that is not working with class objects with a PR that fixes some more issues in findsource with respect to classes : issue35101.


Thanks
msg329204 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-03 19:11
As for the line number for class, I have not heard anything about this. I think there is no large technical difficulty, but there is a little need. The first line number of the code object is needed first at all for tracing and debugging. Function has attached code object for executing, thus the line number is available for function. But the code object that creates a class is no longer needed after the class is created. Attaching the line number to the class looks like attaching the line number of the "[]" expression to the list created by it.
msg329205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-03 19:13
Yury, could you please take a look?
msg366727 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-04-18 16:19
New changeset 696136b993e11b37c4f34d729a0375e5ad544ade by Karthikeyan Singaravelan in branch 'master':
bpo-35113: Fix inspect.getsource to return correct source for inner classes (#10307)
https://github.com/python/cpython/commit/696136b993e11b37c4f34d729a0375e5ad544ade
msg367620 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-04-29 03:59
Closing this as fixed with the enhancement to show decorator for classes too for 3.9. Thank you all for the help on this.
History
Date User Action Args
2020-09-19 05:18:22xtreaklinkissue24078 superseder
2020-04-29 03:59:29xtreaksetstatus: open -> closed
versions: + Python 3.9, - Python 3.6, Python 3.7, Python 3.8
messages: + msg367620

resolution: fixed
stage: patch review -> resolved
2020-04-29 03:37:05xtreaklinkissue40317 superseder
2020-04-18 18:18:45serhiy.storchakasetpull_requests: + pull_request18924
2020-04-18 16:19:39xtreaksetmessages: + msg366727
2019-10-24 16:55:03donovicksetnosy: + donovick
2019-10-24 09:13:41xtreaklinkissue37922 superseder
2019-08-31 10:47:09serhiy.storchakalinkissue15856 superseder
2018-11-03 19:13:16serhiy.storchakasetmessages: + msg329205
2018-11-03 19:11:55serhiy.storchakasetmessages: + msg329204
2018-11-03 18:53:35xtreaksetmessages: + msg329203
2018-11-03 18:12:24serhiy.storchakasetmessages: + msg329202
2018-11-03 11:29:05xtreaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request9614
2018-11-03 07:00:45xtreaksetmessages: + msg329187
2018-11-03 06:50:18serhiy.storchakasetmessages: + msg329183
2018-11-03 06:25:03xtreaksetmessages: + msg329182
2018-11-03 06:08:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg329181
2018-11-03 05:59:12xtreaksetnosy: + yselivanov
messages: + msg329180
2018-11-02 07:53:07xtreaksetmessages: + msg329112
2018-11-02 02:17:43Windson Yangsetnosy: + Windson Yang
messages: + msg329097
2018-10-31 19:53:34pablogsalsetnosy: + pablogsal
2018-10-30 12:32:57xtreakcreate