Title: Inspect library ignore comments at the end of a function (inspect.getsource)
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
Nosy List: BTaskaya, iritkatriel, miss-islington, noureddine.hamid, taleinat, yselivanov
Created on 2020-10-22 10:48 by noureddine.hamid, last changed 2022-04-11 14:59 by admin.

Messages
Author: nhamid (noureddine.hamid) Date: 2020-10-22 10:48
inspect.getsource ignore comments at the end of the function:

for example this function:

def matmul_single(A, x, out):
  from numpy import matmul
  out[:] = matmul(A, x)
  # Some comment here...

using the inspect library:
>>> inspect.getsource(matmul_single)                                                                                                                                     
>>> "def omp_matmul_single(A, x, out):\n  from numpy import matmul\n out[:] = matmul(A, x)\n"

the result does not contain the comments at the end of the function.
Author: Irit Katriel (iritkatriel) Date: 2020-12-03 15:12
1. For a comment line, the tokenizer emits a COMMENT token followed by an NL token for the newline. The inspect.BlockFinder.tokeneater increments its "last" field to the last line it identified as belonging to the code block. Currently it increments it when it sees a NEWLINE token, but not for an NL token.

2. For a comment line, the tokenizer does not emit an INDENT/DEDENT token, so the indentation level when it is processes is assumed to be equal to that of the previous line.

PR 23630 aims to include comment lines in the block if their start column is after the start column of the opening line of the block:

   def f():
      return 42

     # this is a part of f
   # this is not a part of f
Author: Irit Katriel (iritkatriel) Date: 2020-12-03 15:16
For reference - this script:
import inspect
import tokenize
from pprint import pprint as pp

 'def f():\n',
 '    return 1\n',
 '    #that was fun',
 '#Now comes g\n',
 'def g():\n',
 '    return 2\n']


[TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def f():\n'),
 TokenInfo(type=1 (NAME), string='f', start=(1, 4), end=(1, 5), line='def f():\n'),
 TokenInfo(type=54 (OP), string='(', start=(1, 5), end=(1, 6), line='def f():\n'),
 TokenInfo(type=54 (OP), string=')', start=(1, 6), end=(1, 7), line='def f():\n'),
 TokenInfo(type=54 (OP), string=':', start=(1, 7), end=(1, 8), line='def f():\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 8), end=(1, 9), line='def f():\n'),
 TokenInfo(type=5 (INDENT), string='    ', start=(2, 0), end=(2, 4), line='    return 1\n'),
 TokenInfo(type=1 (NAME), string='return', start=(2, 4), end=(2, 10), line='    return 1\n'),
 TokenInfo(type=2 (NUMBER), string='1', start=(2, 11), end=(2, 12), line='    return 1\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 12), end=(2, 13), line='    return 1\n'),
 TokenInfo(type=60 (COMMENT), string='#that was fun', start=(3, 4), end=(3, 17), line='    #that was fun'),
 TokenInfo(type=61 (NL), string='', start=(3, 17), end=(3, 17), line='    #that was fun'),
 TokenInfo(type=61 (NL), string='\n', start=(4, 0), end=(4, 1), line='\n'),
 TokenInfo(type=60 (COMMENT), string='#Now comes g', start=(5, 0), end=(5, 12), line='#Now comes g\n'),
 TokenInfo(type=61 (NL), string='\n', start=(5, 12), end=(5, 13), line='#Now comes g\n'),
 TokenInfo(type=6 (DEDENT), string='', start=(6, 0), end=(6, 0), line='def g():\n'),
 TokenInfo(type=1 (NAME), string='def', start=(6, 0), end=(6, 3), line='def g():\n'),
 TokenInfo(type=1 (NAME), string='g', start=(6, 4), end=(6, 5), line='def g():\n'),
 TokenInfo(type=54 (OP), string='(', start=(6, 5), end=(6, 6), line='def g():\n'),
 TokenInfo(type=54 (OP), string=')', start=(6, 6), end=(6, 7), line='def g():\n'),
 TokenInfo(type=54 (OP), string=':', start=(6, 7), end=(6, 8), line='def g():\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(6, 8), end=(6, 9), line='def g():\n'),
 TokenInfo(type=5 (INDENT), string='    ', start=(7, 0), end=(7, 4), line='    return 2\n'),
 TokenInfo(type=1 (NAME), string='return', start=(7, 4), end=(7, 10), line='    return 2\n'),
 TokenInfo(type=2 (NUMBER), string='2', start=(7, 11), end=(7, 12), line='    return 2\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(7, 12), end=(7, 13), line='    return 2\n'),
 TokenInfo(type=6 (DEDENT), string='', start=(8, 0), end=(8, 0), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(8, 0), end=(8, 0), line='')]
Author: Tal Einat (taleinat) Date: 2020-12-04 16:45
New changeset 6e1eec71f59c344fb23c7977061dc2c97b77d51b by Irit Katriel in branch 'master':
bpo-42116: Fix inspect.getsource handling of trailing comments (GH-23630)
Author: miss-islington (miss-islington) Date: 2020-12-04 20:20
New changeset 81ac030d03bdaedd724603af6f89f9248a5f2700 by Miss Islington (bot) in branch '3.9':
bpo-42116: Fix inspect.getsource handling of trailing comments (GH-23630)
Author: miss-islington (miss-islington) Date: 2020-12-04 20:20
New changeset 3b14f18205b17d1634e21bd7bc48152247590d9f by Miss Islington (bot) in branch '3.8':
bpo-42116: Fix inspect.getsource handling of trailing comments (GH-23630)
Author: Tal Einat (taleinat) Date: 2020-12-04 20:21
Thank you for reporting this, Noureddine Hamid!

Thanks for the PR, Irit!
Author: nhamid (noureddine.hamid) Date: 2020-12-15 12:51
thank you for the fix, I forgot to mention that python 3.6 and python 3.7 have this issue too.
Author: Irit Katriel (iritkatriel) Date: 2020-12-15 13:21
Thanks for the report. 3.6 is no longer maintained and 3.7 is getting security fixes only. So this won't be backported to those versions.
