Improve disassembly to show embedded code objects #56031

rhettinger · 2011-04-10T20:25:52Z

BPO	11822
Nosy	@rhettinger, @ncoghlan, @abalkin, @pitrou, @vstinner, @Bluehorn, @berkerpeksag, @serhiy-storchaka, @jleedev, @matrixise
PRs	bpo-11822: Improve disassembly to show embedded code objects. #1155 bpo-11822: Improve disassembly to show embedded code objects (limited depth). #1844
Files	issue11822.diff issue11822_nested_disassembly.diff: Updated and enhanced for 3.5 issue11822_nested_disassembly_with_off_switch.diff: "nested" levels below 0 now turn off the new feature

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2017-06-11.11:12:25.596>
created_at = <Date 2011-04-10.20:25:51.850>
labels = ['3.7', 'type-feature', 'library']
title = 'Improve disassembly to show embedded code objects'
updated_at = <Date 2017-06-12.09:14:26.170>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2017-06-12.09:14:26.170>
actor = 'vstinner'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2017-06-11.11:12:25.596>
closer = 'serhiy.storchaka'
components = ['Library (Lib)']
creation = <Date 2011-04-10.20:25:51.850>
creator = 'rhettinger'
dependencies = []
files = ['23908', '36565', '36566']
hgrepos = []
issue_num = 11822
keywords = ['patch']
message_count = 25.0
messages = ['133478', '133524', '133542', '133543', '133544', '133545', '137415', '149198', '149199', '226525', '226531', '254220', '267425', '270751', '291715', '291718', '291720', '291721', '291765', '291780', '294648', '294650', '295124', '295703', '295761']
nosy_count = 11.0
nosy_names = ['rhettinger', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'torsten', 'berker.peksag', 'serhiy.storchaka', 'jleedev', 'matrixise', 'Todd Dembrey']
pr_nums = ['1155', '1844']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue11822'
versions = ['Python 3.7']

rhettinger · 2011-04-10T20:25:52Z

Now that list comprehensions mask run their internals in code objects (the same way that genexps do), it is getting harder to use dis() to see what code is generated. For example, the pow() call isn't shown in the following disassembly:

>>> dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_NAME                0 (range) 
              9 LOAD_CONST               1 (3) 
             12 CALL_FUNCTION            1 
             15 GET_ITER             
             16 CALL_FUNCTION            1 
             19 RETURN_VALUE

I propose that dis() build-up a queue undisplayed code objects and then disassemble each of those after the main disassembly is done (effectively making it recursive and displaying code objects in the order that they are first seen in the disassembly). For example, the output shown above would be followed by a disassembly of its internal code object:

<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>:
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE

abalkin · 2011-04-11T14:42:04Z

Would you like to display lambdas as well?

>>> dis('lambda x: x**2')
  1           0 LOAD_CONST               0 (<code object <lambda> at 0x1005c9ad0, file "<dis>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 RETURN_VALUE

<code object <lambda> at 0x1005cb140, file "<dis>", line 1>:
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (2)
6 BINARY_POWER
7 RETURN_VALUE

I like the idea, but would rather see code objects expanded in-line, possibly indented rather than at the end.

pitrou · 2011-04-11T21:08:05Z

I think it should be enabled with an optional argument. Otherwise in some cases you'll get lots of additional output while you're only interested in the top-level code.

rhettinger · 2011-04-11T21:13:10Z

If you disassemble a function, you typically want to see all the code in that function. This isn't like pdb where you're choosing to step over or into another function outside the one being viewed.

pitrou · 2011-04-11T21:21:03Z

If you disassemble a function, you typically want to see all the code
in that function.

That depends on the function. If you do event-driven programming (say,
Twisted deferreds with addCallback()), you don't necessarily want to see
the disassembly of the callbacks that are passed to the various
framework functions. Also, if you do so recursively, it might become
*very* unwieldy.

So I don't think there's anything "typical" here. It depends on what you
intend to focus on.

abalkin · 2011-04-11T21:48:24Z

On Mon, Apr 11, 2011 at 5:21 PM, Antoine Pitrou <report@bugs.python.org> wrote:
..
Raymond>> If you disassemble a function, you typically want to see all the code
Raymond>> [defined] in that function.

+1 (with clarification in [])

If the function calls a function defined elsewhere, I don't want to
see the called function disassembly when I disassemble the caller. In
this case it is very easy to disassemble interesting functions with
separate dis() calls. In the case like the following, however:

def f():
  def g(x):
      return x**2
dis(f)
2           0 LOAD_CONST               1 (<code object g at
0x10055ce88, file "x.py", line 2>)
              3 MAKE_FUNCTION            0
              6 STORE_FAST               0 (g)
...

when I see '<code object g at 0x10055ce88, ..>', I have to do
something unwieldy such as

3 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (2)
6 BINARY_POWER
7 RETURN_VALUE

That depends on the function. If you do event-driven programming (say,
Twisted deferreds with addCallback()), you don't necessarily want to see
the disassembly of the callbacks that are passed to the various
framework functions. Also, if you do so recursively, it might become
*very* unwieldy.

Can you provide some examples of this? Nested functions are typically
short and even if they are long, the size disassembly would be
proportional to the line count of the function being disassembled,
which is expected.

ncoghlan · 2011-06-01T05:10:23Z

Note that Yaniv Aknin (author of the Python's Innards series of blog posts) has a recursive dis variant that may be useful for inspiration:

https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798/common/blog.py

As shown there, this recursive behaviour can also be useful for code_info/show_code.

Bluehorn · 2011-12-10T23:34:19Z

I offer the attached patch as a starting point to fulfill this feature request.

The patch changes the output to insert the disassembly of local code objects on the referencing line.

As that made the output unreadable to me, I added indentation for the nested code (by 4 spaces, hoping that nobody will nest code 10 levels deep :-)

This results in the following output for the original example:

>>> from dis import dis
>>> dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f24a67dde40, file "<dis>", line 1>) 
  1               0 BUILD_LIST               0 
                  3 LOAD_FAST                0 (.0) 
            >>    6 FOR_ITER                16 (to 25) 
                  9 STORE_FAST               1 (x) 
                 12 LOAD_FAST                1 (x) 
                 15 LOAD_CONST               0 (2) 
                 18 BINARY_POWER         
                 19 LIST_APPEND              2 
                 22 JUMP_ABSOLUTE            6 
            >>   25 RETURN_VALUE         
              3 LOAD_CONST               1 ('<listcomp>') 
              6 MAKE_FUNCTION            0 
              9 LOAD_NAME                0 (range) 
             12 LOAD_CONST               2 (3) 
             15 CALL_FUNCTION            1 
             18 GET_ITER             
             19 CALL_FUNCTION            1 
             22 RETURN_VALUE

rhettinger · 2011-12-10T23:44:24Z

Thank you :-)

ncoghlan · 2014-09-07T01:01:43Z

Sorry for the long delay in doing anything with this patch. Unfortunately, trunk has moved on quite a bit since this patch was submitted, and it's no longer directly applicable.

However, the basic principle is sound, so this is a new patch that aligns with the changes made in 3.4 to provide an iterator based bytecode introspection API. It also changes the indenting to be based on the structure of the bytecode disassembly - nested lines start aligned with the opcode *name* on the preceding line. This will get unreadable with more than two or three levels of nesting, but at that point, hard to read disassembly for the top level function is the least of your worries. (A potentially useful option may to be add a flag to turn off the implicit recursion, easily restoring the old single level behaviour. I'd like the recursive version to be the default though, since it's far more useful given that Python 3 comprehensions all involve a nested code object)

A descriptive header makes the new output more self-explanatory. Note that I did try repeating the code object repr from the LOAD_CONST opcode in the new header - it was pretty unreadable, and redundant given the preceding line of disassembly.

Two examples, one showing Torsten's list comprehension from above, and another showing that the nested line numbers work properly.

This can't be applied as is - it's still missing tests, docs, and fixes to disassembly output tests that assume the old behaviour.

>>> dis.dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f459ec4a0c0, file "<dis>", line 1>)
                Disassembly for nested code object
                  1       0 BUILD_LIST               0
                          3 LOAD_FAST                0 (.0)
                    >>    6 FOR_ITER                16 (to 25)
                          9 STORE_FAST               1 (x)
                         12 LOAD_FAST                1 (x)
                         15 LOAD_CONST               0 (2)
                         18 BINARY_POWER
                         19 LIST_APPEND              2
                         22 JUMP_ABSOLUTE            6
                    >>   25 RETURN_VALUE
              3 LOAD_CONST               1 ('<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (range)
             12 LOAD_CONST               2 (3)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 RETURN_VALUE
>>> def f():
...     print("Hello")
...     def g():
...         for x in range(10):
...             yield x
...     return g
... 
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('Hello')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP

3 10 LOAD_CONST 2 (<code object g at 0x7f459ec4a540, file "<stdin>", line 3>)
Disassembly for nested code object
4 0 SETUP_LOOP 25 (to 28)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 11 (to 27)
16 STORE_FAST 0 (x)

              5      19 LOAD_FAST                0 (x)
                     22 YIELD_VALUE
                     23 POP_TOP
                     24 JUMP_ABSOLUTE           13
                >>   27 POP_BLOCK
                >>   28 LOAD_CONST               0 (None)
                     31 RETURN_VALUE
         13 LOAD_CONST               3 ('f.<locals>.g')
         16 MAKE_FUNCTION            0
         19 STORE_FAST               0 (g)

6 22 LOAD_FAST 0 (g)
25 RETURN_VALUE

ncoghlan · 2014-09-07T05:30:36Z

I didn't want to add a second argument to turn off the new behaviour, so I changed it such that passing a value < 0 for "nested" turns off the new feature entirely. Levels >= 0 enable it, defining which level to start with. The default level is "0" so there's no implied prefix, and nested code objects are displayed by default. This picks up at least comprehensions, lambda expressions and nested functions. I haven't checked how it handles nested classes yet.

I used this feature to get the old tests passing again by turning off the recursion feature. New tests for the new behaviour are still needed.

I also tweaked the header to show the *name* of the code object. The full repr is to noisy, but the generic message was hard to read when there were multiple nested code objects.

matrixise · 2015-11-06T21:16:35Z

Hi all,

For this feature, I have an other output:

stephane@sg1 /tmp> python3 dump_bytecode.py
<module>
--------
3 0 LOAD_BUILD_CLASS
1 LOAD_CONST 0 (<code object User at 0x10b830270, file "<show>", line 3>)
4 LOAD_CONST 1 ('User')
7 MAKE_FUNCTION 0
10 LOAD_CONST 1 ('User')
13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
16 STORE_NAME 0 (User)

8 19 LOAD_NAME 0 (User)
22 LOAD_CONST 2 ('user')
25 LOAD_CONST 3 ('password')
28 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
31 STORE_NAME 1 (user)
34 LOAD_CONST 4 (None)
37 RETURN_VALUE

<module>.User
-------------
3 0 LOAD_NAME 0 (name)
3 STORE_NAME 1 (module)
6 LOAD_CONST 0 ('User')
9 STORE_NAME 2 (qualname)

4 12 LOAD_CONST 1 (<code object __init__ at 0x10b824270, file "<show>", line 4>)
15 LOAD_CONST 2 ('User.__init__')
18 MAKE_FUNCTION 0
21 STORE_NAME 3 (init)
24 LOAD_CONST 3 (None)
27 RETURN_VALUE

<module>.User.__init__
----------------------
5 0 LOAD_FAST 1 (email)
3 LOAD_FAST 0 (self)
6 STORE_ATTR 0 (email)

6 9 LOAD_FAST 2 (password)
12 LOAD_FAST 0 (self)
15 STORE_ATTR 1 (password)
18 LOAD_CONST 0 (None)
21 RETURN_VALUE

serhiy-storchaka · 2016-06-05T16:27:54Z

I like Stéphane's idea about placing the output for nested code object at the same level after the output for the main code object.

matrixise · 2016-07-18T13:15:21Z

hello, we can continue the discussion on this issue ?

serhiy-storchaka · 2017-04-15T14:30:46Z

The issue was open 6 years ago. The feature could be added in 3.3. But it still not implemented.

Since there are problems with outputting the disassembly of internal code objects expanded in-line, proposed patch just outputs them after the disassembly of the main code object (similar to original Raymond's proposition). This is simpler and doesn't make the output too wide.

ncoghlan · 2017-04-15T15:50:10Z

+1 for listing the nested code objects after the original one.

In reviewing Serhiy's patch, the core technical implementation looks OK to me, but I think we may want to go with a "depth" argument rather than a simple "recursive" flag.

My rationale for that relates to directly disassembling module and class source code:

dis(module_source, depth=1) # Module, class bodies, function bodies
dis(class_source, depth=1) # Class and method bodies

(with the default depth being 0, to disable recursive descent entirely)

Only if you set a higher depth than 1 would you start seeing things like closures, comprehension bodies, and nested classes.

With a simple all-or-nothing flag, I think module level recursive disassembly would be too noisy to be useful.

The bounded depth approach would also avoid a problem with invalid bytecode manipulations that manage to create a loop between two bytecode objects. While the *compiler* won't do that, there's no guarantee that the disassembler is being fed valid bytecode, so we should avoid exposing ourselves to any infinite loops in the display code.

serhiy-storchaka · 2017-04-15T16:36:04Z

The problem with the *depth* parameter is that it adds a burden of choosing the value for the end user. "Oh, there are more deeper code objects, I must increase the depth and rerun dis()!" I think in most cases when that parameter is specified it would be set to some large value like 999 because you don't want to set it too small. Compare for example with the usage of the attribute maxDiff in unittests.

The single depth parameter doesn't adds too much control. You can't enable disassembling functions and method bodies but disable disassembling comprehensions in functions. If you need more control, you should use non-recursive dis() and manually walk the tree of code objects.

How much output adds unlimited recursion in comparison with the recursion limited by the first level?

As for supporting invalid bytecode, currently the dis module doesn't support it (see bpo-26694).

ncoghlan · 2017-04-15T17:02:03Z

The problem I see is that we have conflicting requirements for the default behaviour:

if we modify dis() instead of adding a new function, then the default behaviour needs to be non-recursive for backwards compatibility reasons
if we allow the depth to be configurable, then we'd like the default behaviour to be to show everything

One potential resolution to that would be to define this as a new function, distree, rather than modifying dis and disassemble to natively support recursion. Then the new function could accept a depth argument (addressing my concerns), but have depth=None as the default (addressing your concerns).

If we wanted to allow even more control than that, then I think os.walk provides a useful precedent, where we'd add a new dis.walk helper that just looked at co_consts to find nested code objects without doing any of the other disassembly work.

The dis.walk() helper would produce an iterable of (depth, code, nested) 3-tuples, where:

the first item is the current depth in the compile tree
the second is the code object itself
the third is a list of nested code objects

Similar to os.walk(), editing the list of nested objects in place would let you control whether or not any further recursion took place.

rhettinger · 2017-04-16T21:22:07Z

if we modify dis() instead of adding a new function, then
the default behaviour needs to be non-recursive for
backwards compatibility reasons

I don't see how we have any backward compatibility issues. The dis() function is purely informational like help().

The problem is it doesn't show important information, list comprehensions are now effectively hidden from everyone who isn't clever and persistent.

I use dis() as a teaching aid in my Python courses and as a debugging tool when doing consulting. From my point of view, it is effectively broken in Python 3.

ncoghlan · 2017-04-17T04:42:59Z

Yeah, I was mixing this up with getargspec (et al), which get used by IDEs and similar tools. While third party tools do use the disassembler, they typically won't use its display logic directly unless they're just dumping the output to a terminal equivalent.

Given that, a "depth=None" parameter on dis and disassemble would provide the default behaviour of rendering the entire compilation tree, while still allowing turning off recursion entirely ("depth=0"), or limiting it to a desired number of levels ("depth=1", etc).

serhiy-storchaka · 2017-05-28T14:33:39Z

PR 1844 implements Nick's suggestion, but I don't like it. This complicates both the interface and the implementation.

help() also can produce very noisy output if request the documentation of the whole module. But the output of help() usually is piped through a pager. Would it help if pipe the output of dis() through a pager if the output file and stdin are attached to a terminal?

rhettinger · 2017-05-28T17:47:00Z

Would it help if pipe the output of dis() through a pager
if the output file and stdin are attached to a terminal?

-1 for adding a pager.

serhiy-storchaka · 2017-06-04T13:34:40Z

Could you please look at the patch Raymond? Is this what you wanted?

serhiy-storchaka · 2017-06-11T11:09:41Z

New changeset 1efbf92 by Serhiy Storchaka in branch 'master':
bpo-11822: Improve disassembly to show embedded code objects. (bpo-1844)
1efbf92

vstinner · 2017-06-12T09:14:26Z

Thanks Serhiy, it works and I like the result :-)

>>> def f():
...  def g():
...   return 3
...  return g
... 

>>> import dis; dis.dis(f)
  2           0 LOAD_CONST               1 (<code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.g')
              4 MAKE_FUNCTION            0
              6 STORE_FAST               0 (g)

4 8 LOAD_FAST 0 (g)
10 RETURN_VALUE

Disassembly of <code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>:
3 0 LOAD_CONST 1 (3)
2 RETURN_VALUE

rhettinger added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 10, 2011

rhettinger self-assigned this Apr 13, 2011

rhettinger removed their assignment Nov 10, 2015

serhiy-storchaka added the 3.7 (EOL) end of life label Apr 15, 2017

serhiy-storchaka self-assigned this May 30, 2017

serhiy-storchaka closed this as completed Jun 11, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve disassembly to show embedded code objects #56031

Improve disassembly to show embedded code objects #56031

rhettinger commented Apr 10, 2011

rhettinger commented Apr 10, 2011

abalkin commented Apr 11, 2011

pitrou commented Apr 11, 2011

rhettinger commented Apr 11, 2011

pitrou commented Apr 11, 2011

abalkin commented Apr 11, 2011

ncoghlan commented Jun 1, 2011

Bluehorn mannequin commented Dec 10, 2011

rhettinger commented Dec 10, 2011

ncoghlan commented Sep 7, 2014

ncoghlan commented Sep 7, 2014

matrixise commented Nov 6, 2015

serhiy-storchaka commented Jun 5, 2016

matrixise commented Jul 18, 2016

serhiy-storchaka commented Apr 15, 2017

ncoghlan commented Apr 15, 2017

serhiy-storchaka commented Apr 15, 2017

ncoghlan commented Apr 15, 2017

rhettinger commented Apr 16, 2017

ncoghlan commented Apr 17, 2017

serhiy-storchaka commented May 28, 2017

rhettinger commented May 28, 2017

serhiy-storchaka commented Jun 4, 2017

serhiy-storchaka commented Jun 11, 2017

vstinner commented Jun 12, 2017

Navigation Menu

Improve disassembly to show embedded code objects #56031

Improve disassembly to show embedded code objects #56031

Comments

rhettinger commented Apr 10, 2011

rhettinger commented Apr 10, 2011

abalkin commented Apr 11, 2011

pitrou commented Apr 11, 2011

rhettinger commented Apr 11, 2011

pitrou commented Apr 11, 2011

abalkin commented Apr 11, 2011

ncoghlan commented Jun 1, 2011

Bluehorn mannequin commented Dec 10, 2011

rhettinger commented Dec 10, 2011

ncoghlan commented Sep 7, 2014

ncoghlan commented Sep 7, 2014

matrixise commented Nov 6, 2015

serhiy-storchaka commented Jun 5, 2016

matrixise commented Jul 18, 2016

serhiy-storchaka commented Apr 15, 2017

ncoghlan commented Apr 15, 2017

serhiy-storchaka commented Apr 15, 2017

ncoghlan commented Apr 15, 2017

rhettinger commented Apr 16, 2017

ncoghlan commented Apr 17, 2017

serhiy-storchaka commented May 28, 2017

rhettinger commented May 28, 2017

serhiy-storchaka commented Jun 4, 2017

serhiy-storchaka commented Jun 11, 2017

vstinner commented Jun 12, 2017