This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Inconsistent/incomplete disassembly of methods vs method source code
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: smurthy, steven.daprano
Priority: normal Keywords:

Created on 2020-02-29 15:27 by smurthy, last changed 2022-04-11 14:59 by admin.

Messages (7)
msg362985 - (view) Author: S Murthy (smurthy) * Date: 2020-02-29 15:27
I am using the dis module to look at source (and logical) lines of code vs corresponding bytecode instructions. I am bit confused by the output of dis.dis when disassembling a given method vs the corresponding source string, e.g.

>>> def f(x): return x**2
>>> dis.dis(f)
  1 0 LOAD_FAST              0 (x)
  2 LOAD_CONST               1 (2)
  4 BINARY_POWER
  6 RETURN_VALUE

This is the bytecode instruction block for the body only (not the method header), but dis.dis('def f(x): return x**2') produces the instructions for the header and body:

>>> dis.dis('def f(x): return x**2')
  1 0 LOAD_CONST               0 (<code object f at 0x10b0f7f60, file "<dis>", line 1>)
  2 LOAD_CONST               1 ('f')
  4 MAKE_FUNCTION            0
  6 STORE_NAME               0 (f)
  8 LOAD_CONST               2 (None)
 10 RETURN_VALUE

Disassembly of <code object f at 0x10b0f7f60, file "<dis>", line 1>:
  1 0 LOAD_FAST              0 (x)
  2 LOAD_CONST               1 (2)
  4 BINARY_POWER
  6 RETURN_VALUE

I have traced this difference to the different behaviour of dis.dis for methods vs source code strings:

def dis(x=None, *, file=None, depth=None):
    ...
    ...
    if hasattr(x, '__code__'):
        x = x.__code__
    ...
    # Perform the disassembly
    ...
    elif hasattr(x, 'co_code'): # Code object
        _disassemble_recursive(x, file=file, depth=depth)
    ...
    elif isinstance(x, str):    # Source code
        _disassemble_str(x, file=file, depth=depth)    
    ...

It appears as if the method body is contained in the code object produced from compiling the source (_try_compile(source, '<dis>', ...)) but not if the code object was obtained from f.__code__.

Why is this the case, and would it not be better to for dis.dis to behave consistently for methods and source strings of methods, and to generate/produce the complete instruction set, including for any headers? The current behaviour of dis.dis means that Bytecode(x) is also affected, as iterating over the instructions gives you different instructions depending on whether x is a method or a source string of x:

>>> for instr in dis.Bytecode(f): 
...     print(instr) 
...
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=1, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=2, argrepr='2', offset=2, starts_line=None, is_jump_target=False)
Instruction(opname='BINARY_POWER', opcode=19, arg=None, argval=None, argrepr='', offset=4, starts_line=None, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=6, starts_line=None, is_jump_target=False

>>> for instr in dis.Bytecode(inspect.getsource(f)): 
...     print(instr) 
...
Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval=<code object f at 0x11e4036f0, file "<disassembly>", line 1>, argrepr='<code object f at 0x11e4036f0, file "<disassembly>", line 1>', offset=0, starts_line=1, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval='f', argrepr="'f'", offset=2, starts_line=None, is_jump_target=False)
Instruction(opname='MAKE_FUNCTION', opcode=132, arg=0, argval=0, argrepr='', offset=4, starts_line=None, is_jump_target=False)
Instruction(opname='STORE_NAME', opcode=90, arg=0, argval='f', argrepr='f', offset=6, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=2, argval=None, argrepr='None', offset=8, starts_line=None, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=10, starts_line=None, is_jump_target=False)
msg362989 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-29 16:40
> would it not be better to for dis.dis to behave consistently for methods and source strings of methods

I don't think so. The inputs are different. One is a string containing a `def` statement, which is an executable statement. The other is a function or method object, which may or may not have been created by a `def` statement.

(You can, although not easily, assemble a function object yourself without using either `def` or `lambda` directly.)

The `def` statement assembles a function object out of a pre-compiled body, and that's what dis shows when you give it a source code string that happens to contain a `def`. It's just another statement, like an import or loop or assignment. The contents of the body (the code object) isn't shown because the byte-code generated for a `def` knows nothing about the contents of the body.

If you want to know the contents of the body, you have to look at the body (the code object) itself.
msg362995 - (view) Author: S Murthy (smurthy) * Date: 2020-02-29 17:09
@steven.daprano In this case, the method f was created by via def. And calling dis.dis(s) where s is the source code of f (say s = inspect.getsource(f)) shows the bytecode both for the header and the body, as is clear enough from the example I first posted.

>>> dis.dis('def f(x): return x**2')
  1 0 LOAD_CONST               0 (<code object f at 0x10b0f7f60, file "<dis>", line 1>)
  2 LOAD_CONST               1 ('f')
  4 MAKE_FUNCTION            0
  6 STORE_NAME               0 (f)
  8 LOAD_CONST               2 (None)
 10 RETURN_VALUE

Disassembly of <code object f at 0x10b0f7f60, file "<dis>", line 1>:
  1 0 LOAD_FAST              0 (x)
  2 LOAD_CONST               1 (2)
  4 BINARY_POWER
  6 RETURN_VALUE

The first block of instructions here are for the def statement, and the second block for the return statement.
msg362997 - (view) Author: S Murthy (smurthy) * Date: 2020-02-29 17:21
BTW how else are methods/functions are created in Python except via def?
msg363040 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-03-01 03:48
> BTW how else are methods/functions are created in Python except via def?

Functions are objects like everything else in Python, so they have a 
type, which has a constructor:

    from types import FunctionType

The documentation for FunctionType is a bit thin, so you may need to 
experiment a bit to get the details right, but it can be done.

Unfortunately copy.copy doesn't actually copy functions, which is in my 
opinion a bug (see #39805) but if you search the internet, you will find 
code that makes independent copies of function objects.

Methods are different from functions, and like functions, they 
too have a type with a constructor:

    from types import MethodType
msg363042 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-03-01 04:24
Ah, I see now. I was using an older version of Python and the output of 
dis was different. It didn't recurse in to show the disassembly of the 
code object as well.

> The first block of instructions here are for the def statement, and 
> the second block for the return statement.

The first block of byte code is for the def statement, and the second 
block is for the code object, which may be more than just a return 
statement.

What would you expect the disassembly of this to show?

dis.dis("""
import y
def f(x):
    a = 2*x - 1
    return a**2

print('hello')
""")

Would you expect the disassembled code object to show up in that as 
well? I'm not sure what I would expect.
msg363043 - (view) Author: S Murthy (smurthy) * Date: 2020-03-01 06:58
Yes, I know that a method body may contain more than just the return statement - I was referrring to the byte code for the method def f(x): return x**2.

I don't think the output of dis.dis is correct here for the source string of f - it doesn't make sense for example to iterate over Bytecode(f) and get only the instruction for the return statement, but then iterate over Bytecode(inspect.getsource(f)) to get only the byte code for the def - the documentation for dis.dis and Bytecode indicate that all the bytecode for a piece of code, whether specified as a method, callable, generator, async. generator, coroutine, class, or a source string, will be generated.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 83981
2020-03-01 06:58:53smurthysetmessages: + msg363043
2020-03-01 04:24:00steven.dapranosetmessages: + msg363042
2020-03-01 03:48:39steven.dapranosetmessages: + msg363040
2020-02-29 17:21:30smurthysetmessages: + msg362997
2020-02-29 17:09:54smurthysetmessages: + msg362995
2020-02-29 16:40:10steven.dapranosetnosy: + steven.daprano
messages: + msg362989
2020-02-29 15:27:53smurthycreate