classification
Title: inspect.findsource doesn't handle shortened files gracefully
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, lisroach, mib, thatch, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2019-06-05 19:46 by thatch, last changed 2019-06-28 17:46 by mib.

Pull Requests
URL Status Linked Edit
PR 13850 open mib, 2019-06-05 21:59
Messages (6)
msg344761 - (view) Author: Tim Hatch (thatch) * Date: 2019-06-05 19:46
inspect.findsource() can trigger IndexError when co_firstlineno is larger than len(linecache.getlines()).

If you have a situation where the file that linecache finds doesn't match the imported module, then you're not guaranteed that co_firstlineno on the code objects makes any sense.  We hit this in a special kind of par file, but it could be triggered by shortening the file, like doing a git checkout of an older version while it's running.

a.py (3 lines):
    # line 1
    # line 2
    def foo(): pass  # co_firstlineno=3

a.py.v2 (1 line):
    def foo(): pass

repro:
    import a
    # modification happens, cp a.py.v2 a.py
    inspect.getsource(a.foo)


Although linecache.getline() does bounds checking, `inspect.findsource()` (which is used by various other functions, including `inspect.stack()`) grabs all the lines and loops through them.  The bug is in this section of `inspect.py`:

    if iscode(object):
        if not hasattr(object, 'co_firstlineno'):
            raise OSError('could not find function definition')
        lnum = object.co_firstlineno - 1
        pat = re.compile(r'^(\s*def\s)|(\s*async\s+def\s)|(.*(?<!\w)lambda(:|\s))|^(\s*@)')
        while lnum > 0:
            if pat.match(lines[lnum]): break
            lnum = lnum - 1
        return lines, lnum
    raise OSError('could not find code object')

Found through future.utils.raise_from which actually doesn't need to be using `inspect.stack()`, it can switch to `sys._getframe(2)` or so.  We should have a PR ready shortly, but wondering if this can be backported to at least 3.6?
msg344766 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2019-06-05 20:16
3.6 is in security-only mode, this doesn't look like a security issue to me
msg345921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 21:53
inspect.getsource() starts with linecache.checkcache(file) which if the file modification time changed or if the file size changed. Using regular files on disk, I don't see how this bug is possible:


$ cat x.py 
import inspect
import os
import linecache






def foo(): pass

filename = __file__
linecache.getlines(filename)

if 1:
    stat = os.stat(filename)
    atime = stat.st_atime
    mtime = stat.st_mtime
    with open(filename, "w+") as fp:
        lines = [line for line in fp if line.strip()]
        fp.seek(0)
        fp.writelines(lines)
    os.utime(filename, (atime, mtime))

print(inspect.getsource(foo))

vstinner@apu$ /bin/cp x.py y.py; ./python y.py
Traceback (most recent call last):
  File "y.py", line 25, in <module>
  File "/home/vstinner/prog/python/master/Lib/inspect.py", line 985, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/vstinner/prog/python/master/Lib/inspect.py", line 967, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/vstinner/prog/python/master/Lib/inspect.py", line 798, in findsource
    raise OSError('could not get source code')
OSError: could not get source code


Does this bug only affect Facebook who use "compile the application into an executable package (.par)"?
https://github.com/python/cpython/pull/13850#pullrequestreview-249815779
msg345922 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 21:55
If the file content changed, I'm not sure that the folloing heuristic will always find the expected function:

        lnum = object.co_firstlineno - 1
        pat = re.compile(r'^(\s*def\s)|(\s*async\s+def\s)|(.*(?<!\w)lambda(:|\s))|^(\s*@)')
        while lnum > 0:
            if pat.match(lines[lnum]): break
            lnum = lnum - 1
        return lines, lnum

The regex doesn't search for the function name.
msg346840 - (view) Author: Michael Bejda (mib) * Date: 2019-06-28 17:33
I do not believe it is specific to the way we package things:

$ cat a.py:
import inspect








def foo(): pass

inspect.getsource(foo)
$ python3 -m compileall -b a.py
$ black a.py  # to remove the empty lines
$ python3 a.pyc  # Index error
Traceback (most recent call last):
  File "a.py", line 12, in <module>
  File "/usr/local/fbcode/gcc-5-glibc-2.23/lib/python3.6/inspect.py", line 968, in getsource
    lines, lnum = getsourcelines(object)
  File "/usr/local/fbcode/gcc-5-glibc-2.23/lib/python3.6/inspect.py", line 955, in getsourcelines
    lines, lnum = findsource(object)
  File "/usr/local/fbcode/gcc-5-glibc-2.23/lib/python3.6/inspect.py", line 828, in findsource
    if pat.match(lines[lnum]): break
IndexError: list index out of range


You are correct, the heuristic may not return the correct function. However, if we are, for any reason, unable to detect the source mismatch, it would be better to return 0 line (for the beginning of the module) or raise a SystemError instead of an IndexError (which the user does not expect).
msg346845 - (view) Author: Michael Bejda (mib) * Date: 2019-06-28 17:46
I don't think linecache.checkcache helps here. In your example, it would detect the modification, but I don't see how it could persist data if the compilation and execution are different processes.
History
Date User Action Args
2019-06-28 17:46:00mibsetmessages: + msg346845
2019-06-28 17:33:33mibsetnosy: + mib
messages: + msg346840
2019-06-17 21:55:26vstinnersetmessages: + msg345922
2019-06-17 21:53:39vstinnersetmessages: + msg345921
2019-06-05 21:59:36mibsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13727
2019-06-05 20:16:45SilentGhostsetnosy: + SilentGhost, yselivanov

messages: + msg344766
versions: + Python 3.9, - Python 3.6
2019-06-05 19:46:40thatchsettype: behavior
2019-06-05 19:46:26thatchcreate