classification
Title: Improve disassembly to show embedded code objects
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Todd Dembrey, belopolsky, berker.peksag, haypo, jleedev, matrixise, ncoghlan, pitrou, rhettinger, serhiy.storchaka, torsten
Priority: normal Keywords: patch

Created on 2011-04-10 20:25 by rhettinger, last changed 2017-06-12 09:14 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
issue11822.diff torsten, 2011-12-10 23:34
issue11822_nested_disassembly.diff ncoghlan, 2014-09-07 01:01 Updated and enhanced for 3.5 review
issue11822_nested_disassembly_with_off_switch.diff ncoghlan, 2014-09-07 05:30 "nested" levels below 0 now turn off the new feature review
Pull Requests
URL Status Linked Edit
PR 1155 closed serhiy.storchaka, 2017-04-15 14:22
PR 1844 merged serhiy.storchaka, 2017-05-28 14:19
Messages (25)
msg133478 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-10 20:25
Now that list comprehensions mask run their internals in code objects (the same way that genexps do), it is getting harder to use dis() to see what code is generated.  For example, the pow() call isn't shown in the following disassembly:

>>> dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_NAME                0 (range) 
              9 LOAD_CONST               1 (3) 
             12 CALL_FUNCTION            1 
             15 GET_ITER             
             16 CALL_FUNCTION            1 
             19 RETURN_VALUE    

I propose that dis() build-up a queue undisplayed code objects and then disassemble each of those after the main disassembly is done (effectively making it recursive and displaying code objects in the order that they are first seen in the disassembly).  For example, the output shown above would be followed by a disassembly of its internal code object:

<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>:
  1           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                16 (to 25) 
              9 STORE_FAST               1 (x) 
             12 LOAD_FAST                1 (x) 
             15 LOAD_CONST               0 (2) 
             18 BINARY_POWER         
             19 LIST_APPEND              2 
             22 JUMP_ABSOLUTE            6 
        >>   25 RETURN_VALUE
msg133524 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-11 14:42
Would you like to display lambdas as well?


>>> dis('lambda x: x**2')
  1           0 LOAD_CONST               0 (<code object <lambda> at 0x1005c9ad0, file "<dis>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 RETURN_VALUE         

<code object <lambda> at 0x1005cb140, file "<dis>", line 1>:
  1           0 LOAD_FAST                0 (x) 
              3 LOAD_CONST               1 (2) 
              6 BINARY_POWER         
              7 RETURN_VALUE         


I like the idea, but would rather see code objects expanded in-line, possibly indented rather than at the end.
msg133542 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-11 21:08
I think it should be enabled with an optional argument. Otherwise in some cases you'll get lots of additional output while you're only interested in the top-level code.
msg133543 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-11 21:13
If you disassemble a function, you typically want to see all the code in that function.  This isn't like pdb where you're choosing to step over or into another function outside the one being viewed.
msg133544 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-11 21:21
> If you disassemble a function, you typically want to see all the code
> in that function.

That depends on the function. If you do event-driven programming (say,
Twisted deferreds with addCallback()), you don't necessarily want to see
the disassembly of the callbacks that are passed to the various
framework functions. Also, if you do so recursively, it might become
*very* unwieldy.

So I don't think there's anything "typical" here. It depends on what you
intend to focus on.
msg133545 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-11 21:48
On Mon, Apr 11, 2011 at 5:21 PM, Antoine Pitrou <report@bugs.python.org> wrote:
..
Raymond>> If you disassemble a function, you typically want to see all the code
Raymond>> [defined] in that function.

+1 (with clarification in [])

If the function calls a function defined elsewhere, I don't want to
see the called function disassembly when I disassemble the caller.  In
this case it is very easy to disassemble interesting functions with
separate dis() calls.  In the case like the following, however:

def f():
  def g(x):
      return x**2
dis(f)
2           0 LOAD_CONST               1 (<code object g at
0x10055ce88, file "x.py", line 2>)
              3 MAKE_FUNCTION            0
              6 STORE_FAST               0 (g)
...

when I see '<code object g at 0x10055ce88, ..>', I have to do
something unwieldy such as

  3           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (2)
              6 BINARY_POWER
              7 RETURN_VALUE

>
> That depends on the function. If you do event-driven programming (say,
> Twisted deferreds with addCallback()), you don't necessarily want to see
> the disassembly of the callbacks that are passed to the various
> framework functions. Also, if you do so recursively, it might become
> *very* unwieldy.

Can you provide some examples of this?  Nested functions are typically
short and even if they are long, the size disassembly would be
proportional to the line count of the function being disassembled,
which is expected.
msg137415 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-06-01 05:10
Note that Yaniv Aknin (author of the Python's Innards series of blog posts) has a recursive dis variant that may be useful for inspiration:

https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798/common/blog.py

As shown there, this recursive behaviour can also be useful for code_info/show_code.
msg149198 - (view) Author: Torsten Landschoff (torsten) * Date: 2011-12-10 23:34
I offer the attached patch as a starting point to fulfill this feature request.

The patch changes the output to insert the disassembly of local code objects on the referencing line.

As that made the output unreadable to me, I added indentation for the nested code (by 4 spaces, hoping that nobody will nest code 10 levels deep :-)

This results in the following output for the original example:

>>> from dis import dis
>>> dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f24a67dde40, file "<dis>", line 1>) 
  1               0 BUILD_LIST               0 
                  3 LOAD_FAST                0 (.0) 
            >>    6 FOR_ITER                16 (to 25) 
                  9 STORE_FAST               1 (x) 
                 12 LOAD_FAST                1 (x) 
                 15 LOAD_CONST               0 (2) 
                 18 BINARY_POWER         
                 19 LIST_APPEND              2 
                 22 JUMP_ABSOLUTE            6 
            >>   25 RETURN_VALUE         
              3 LOAD_CONST               1 ('<listcomp>') 
              6 MAKE_FUNCTION            0 
              9 LOAD_NAME                0 (range) 
             12 LOAD_CONST               2 (3) 
             15 CALL_FUNCTION            1 
             18 GET_ITER             
             19 CALL_FUNCTION            1 
             22 RETURN_VALUE
msg149199 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-10 23:44
Thank you :-)
msg226525 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-07 01:01
Sorry for the long delay in doing anything with this patch. Unfortunately, trunk has moved on quite a bit since this patch was submitted, and it's no longer directly applicable.

However, the basic principle is sound, so this is a new patch that aligns with the changes made in 3.4 to provide an iterator based bytecode introspection API. It also changes the indenting to be based on the structure of the bytecode disassembly - nested lines start aligned with the opcode *name* on the preceding line. This will get unreadable with more than two or three levels of nesting, but at that point, hard to read disassembly for the top level function is the least of your worries. (A potentially useful option may to be add a flag to turn off the implicit recursion, easily restoring the old single level behaviour. I'd like the recursive version to be the default though, since it's far more useful given that Python 3 comprehensions all involve a nested code object)

A descriptive header makes the new output more self-explanatory. Note that I did try repeating the code object repr from the LOAD_CONST opcode in the new header - it was pretty unreadable, and redundant given the preceding line of disassembly.

Two examples, one showing Torsten's list comprehension from above, and another showing that the nested line numbers work properly.

This can't be applied as is - it's still missing tests, docs, and fixes to disassembly output tests that assume the old behaviour.

>>> dis.dis('[x**2 for x in range(3)]')
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f459ec4a0c0, file "<dis>", line 1>)
                Disassembly for nested code object
                  1       0 BUILD_LIST               0
                          3 LOAD_FAST                0 (.0)
                    >>    6 FOR_ITER                16 (to 25)
                          9 STORE_FAST               1 (x)
                         12 LOAD_FAST                1 (x)
                         15 LOAD_CONST               0 (2)
                         18 BINARY_POWER
                         19 LIST_APPEND              2
                         22 JUMP_ABSOLUTE            6
                    >>   25 RETURN_VALUE
              3 LOAD_CONST               1 ('<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (range)
             12 LOAD_CONST               2 (3)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 RETURN_VALUE
>>> def f():
...     print("Hello")
...     def g():
...         for x in range(10):
...             yield x
...     return g
... 
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('Hello')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP

  3          10 LOAD_CONST               2 (<code object g at 0x7f459ec4a540, file "<stdin>", line 3>)
                Disassembly for nested code object
                  4       0 SETUP_LOOP              25 (to 28)
                          3 LOAD_GLOBAL              0 (range)
                          6 LOAD_CONST               1 (10)
                          9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                         12 GET_ITER
                    >>   13 FOR_ITER                11 (to 27)
                         16 STORE_FAST               0 (x)

                  5      19 LOAD_FAST                0 (x)
                         22 YIELD_VALUE
                         23 POP_TOP
                         24 JUMP_ABSOLUTE           13
                    >>   27 POP_BLOCK
                    >>   28 LOAD_CONST               0 (None)
                         31 RETURN_VALUE
             13 LOAD_CONST               3 ('f.<locals>.g')
             16 MAKE_FUNCTION            0
             19 STORE_FAST               0 (g)

  6          22 LOAD_FAST                0 (g)
             25 RETURN_VALUE
msg226531 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-07 05:30
I didn't want to add a second argument to turn off the new behaviour, so I changed it such that passing a value < 0 for "nested" turns off the new feature entirely. Levels >= 0 enable it, defining which level to start with. The default level is "0" so there's no implied prefix, and nested code objects are displayed by default. This picks up at least comprehensions, lambda expressions and nested functions. I haven't checked how it handles nested classes yet.

I used this feature to get the old tests passing again by turning off the recursion feature. New tests for the new behaviour are still needed.

I also tweaked the header to show the *name* of the code object. The full repr is to noisy, but the generic message was hard to read when there were multiple nested code objects.
msg254220 - (view) Author: Stéphane Wirtel (matrixise) * Date: 2015-11-06 21:16
Hi all,

For this feature, I have an other output:

stephane@sg1 /tmp> python3 dump_bytecode.py
<module>
--------
  3           0 LOAD_BUILD_CLASS
              1 LOAD_CONST               0 (<code object User at 0x10b830270, file "<show>", line 3>)
              4 LOAD_CONST               1 ('User')
              7 MAKE_FUNCTION            0
             10 LOAD_CONST               1 ('User')
             13 CALL_FUNCTION            2 (2 positional, 0 keyword pair)
             16 STORE_NAME               0 (User)

  8          19 LOAD_NAME                0 (User)
             22 LOAD_CONST               2 ('user')
             25 LOAD_CONST               3 ('password')
             28 CALL_FUNCTION            2 (2 positional, 0 keyword pair)
             31 STORE_NAME               1 (user)
             34 LOAD_CONST               4 (None)
             37 RETURN_VALUE

<module>.User
-------------
  3           0 LOAD_NAME                0 (__name__)
              3 STORE_NAME               1 (__module__)
              6 LOAD_CONST               0 ('User')
              9 STORE_NAME               2 (__qualname__)

  4          12 LOAD_CONST               1 (<code object __init__ at 0x10b824270, file "<show>", line 4>)
             15 LOAD_CONST               2 ('User.__init__')
             18 MAKE_FUNCTION            0
             21 STORE_NAME               3 (__init__)
             24 LOAD_CONST               3 (None)
             27 RETURN_VALUE

<module>.User.__init__
----------------------
  5           0 LOAD_FAST                1 (email)
              3 LOAD_FAST                0 (self)
              6 STORE_ATTR               0 (email)

  6           9 LOAD_FAST                2 (password)
             12 LOAD_FAST                0 (self)
             15 STORE_ATTR               1 (password)
             18 LOAD_CONST               0 (None)
             21 RETURN_VALUE
msg267425 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-05 16:27
I like Stéphane's idea about placing the output for nested code object at the same level after the output for the main code object.
msg270751 - (view) Author: Stéphane Wirtel (matrixise) * Date: 2016-07-18 13:15
hello, we can continue the discussion on this issue ?
msg291715 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-15 14:30
The issue was open 6 years ago. The feature could be added in 3.3. But it still not implemented.

Since there are problems with outputting the disassembly of internal code objects expanded in-line, proposed patch just outputs them after the disassembly of the main code object (similar to original Raymond's proposition). This is simpler and doesn't make the output too wide.
msg291718 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-04-15 15:50
+1 for listing the nested code objects after the original one.

In reviewing Serhiy's patch, the core technical implementation looks OK to me, but I think we may want to go with a "depth" argument rather than a simple "recursive" flag.

My rationale for that relates to directly disassembling module and class source code:

- dis(module_source, depth=1) # Module, class bodies, function bodies
- dis(class_source, depth=1) # Class and method bodies

(with the default depth being 0, to disable recursive descent entirely)

Only if you set a higher depth than 1 would you start seeing things like closures, comprehension bodies, and nested classes.

With a simple all-or-nothing flag, I think module level recursive disassembly would be too noisy to be useful.

The bounded depth approach would also avoid a problem with invalid bytecode manipulations that manage to create a loop between two bytecode objects. While the *compiler* won't do that, there's no guarantee that the disassembler is being fed valid bytecode, so we should avoid exposing ourselves to any infinite loops in the display code.
msg291720 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-15 16:36
The problem with the *depth* parameter is that it adds a burden of choosing the value for the end user. "Oh, there are more deeper code objects, I must increase the depth and rerun dis()!" I think in most cases when that parameter is specified it would be set to some large value like 999 because you don't want to set it too small. Compare for example with the usage of the attribute maxDiff in unittests.

The single depth parameter doesn't adds too much control. You can't enable disassembling functions and method bodies but disable disassembling comprehensions in functions. If you need more control, you  should use non-recursive dis() and manually walk the tree of code objects.

How much output adds unlimited recursion in comparison with the recursion limited by the first level?

As for supporting invalid bytecode, currently the dis module doesn't support it (see issue26694).
msg291721 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-04-15 17:02
The problem I see is that we have conflicting requirements for the default behaviour:

- if we modify dis() instead of adding a new function, then the default behaviour needs to be non-recursive for backwards compatibility reasons
- if we allow the depth to be configurable, then we'd like the default behaviour to be to show everything

One potential resolution to that would be to define this as a new function, `distree`, rather than modifying `dis` and `disassemble` to natively support recursion. Then the new function could accept a `depth` argument (addressing my concerns), but have `depth=None` as the default (addressing your concerns).

If we wanted to allow even more control than that, then I think os.walk provides a useful precedent, where we'd add a new `dis.walk` helper that just looked at `co_consts` to find nested code objects without doing any of the other disassembly work.

The dis.walk() helper would produce an iterable of (depth, code, nested) 3-tuples, where:

- the first item is the current depth in the compile tree
- the second is the code object itself
- the third is a list of nested code objects

Similar to os.walk(), editing the list of nested objects in place would let you control whether or not any further recursion took place.
msg291765 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-16 21:22
> if we modify dis() instead of adding a new function, then
> the default behaviour needs to be non-recursive for 
> backwards compatibility reasons

I don't see how we have any backward compatibility issues.  The dis() function is purely informational like help().  

The problem is it doesn't show important information, list comprehensions are now effectively hidden from everyone who isn't clever and persistent.   

I use dis() as a teaching aid in my Python courses and as a debugging tool when doing consulting.  From my point of view, it is effectively broken in Python 3.
msg291780 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-04-17 04:42
Yeah, I was mixing this up with getargspec (et al), which get used by IDEs and similar tools. While third party tools do use the disassembler, they typically won't use its display logic directly unless they're just dumping the output to a terminal equivalent.

Given that, a "depth=None" parameter on `dis` and `disassemble` would provide the default behaviour of rendering the entire compilation tree, while still allowing turning off recursion entirely ("depth=0"), or limiting it to a desired number of levels ("depth=1", etc).
msg294648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-28 14:33
PR 1844 implements Nick's suggestion, but I don't like it. This complicates both the interface and the implementation.

help() also can produce very noisy output if request the documentation of the whole module. But the output of help() usually is piped through a pager. Would it help if pipe the output of dis() through a pager if the output file and stdin are attached to a terminal?
msg294650 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-05-28 17:46
> Would it help if pipe the output of dis() through a pager 
> if the output file and stdin are attached to a terminal?

-1 for adding a pager.
msg295124 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-04 13:34
Could you please look at the patch Raymond? Is this what you wanted?
msg295703 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-11 11:09
New changeset 1efbf92e90ed2edf3f5bb5323340b26f318ff61e by Serhiy Storchaka in branch 'master':
bpo-11822: Improve disassembly to show embedded code objects. (#1844)
https://github.com/python/cpython/commit/1efbf92e90ed2edf3f5bb5323340b26f318ff61e
msg295761 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-06-12 09:14
Thanks Serhiy, it works and I like the result :-)

>>> def f():
...  def g():
...   return 3
...  return g
... 

>>> import dis; dis.dis(f)
  2           0 LOAD_CONST               1 (<code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.g')
              4 MAKE_FUNCTION            0
              6 STORE_FAST               0 (g)

  4           8 LOAD_FAST                0 (g)
             10 RETURN_VALUE

Disassembly of <code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>:
  3           0 LOAD_CONST               1 (3)
              2 RETURN_VALUE
History
Date User Action Args
2017-06-12 09:14:26hayposetmessages: + msg295761
2017-06-11 11:12:25serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-06-11 11:09:41serhiy.storchakasetmessages: + msg295703
2017-06-04 13:34:40serhiy.storchakasetmessages: + msg295124
2017-05-30 07:38:35serhiy.storchakasetassignee: serhiy.storchaka
2017-05-28 17:46:59rhettingersetmessages: + msg294650
2017-05-28 14:33:39serhiy.storchakasetmessages: + msg294648
2017-05-28 14:19:13serhiy.storchakasetpull_requests: + pull_request1931
2017-04-17 04:42:59ncoghlansetmessages: + msg291780
2017-04-16 21:22:06rhettingersetmessages: + msg291765
2017-04-15 17:02:03ncoghlansetmessages: + msg291721
2017-04-15 16:36:04serhiy.storchakasetmessages: + msg291720
2017-04-15 15:50:09ncoghlansetmessages: + msg291718
2017-04-15 14:30:46serhiy.storchakasetmessages: + msg291715
versions: + Python 3.7, - Python 3.6
2017-04-15 14:22:00serhiy.storchakasetpull_requests: + pull_request1286
2016-07-18 13:15:21matrixisesetmessages: + msg270751
2016-06-05 16:27:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg267425
2015-12-03 17:17:52Todd Dembreysetnosy: + Todd Dembrey
2015-11-10 07:27:13rhettingersetassignee: rhettinger -> (no value)
2015-11-06 21:16:34matrixisesetnosy: + matrixise

messages: + msg254220
versions: + Python 3.6, - Python 3.5
2015-01-26 01:28:23berker.peksagsetnosy: + berker.peksag
2014-09-07 05:30:36ncoghlansetfiles: + issue11822_nested_disassembly_with_off_switch.diff

messages: + msg226531
2014-09-07 01:01:46ncoghlansetfiles: + issue11822_nested_disassembly.diff

messages: + msg226525
2014-08-16 12:42:16hayposetnosy: + haypo
2014-08-12 19:49:09rhettingersetstage: needs patch -> patch review
versions: + Python 3.5, - Python 3.4
2014-08-12 16:20:50jleedevsetnosy: + jleedev
2013-07-12 07:35:54rhettingersetstage: needs patch
versions: + Python 3.4, - Python 3.3
2011-12-10 23:44:24rhettingersetmessages: + msg149199
2011-12-10 23:34:19torstensetfiles: + issue11822.diff

nosy: + torsten
messages: + msg149198

keywords: + patch
2011-06-01 05:10:23ncoghlansetnosy: + ncoghlan
messages: + msg137415
2011-04-13 02:28:27rhettingersetassignee: rhettinger
2011-04-11 21:48:24belopolskysetmessages: + msg133545
2011-04-11 21:21:03pitrousetmessages: + msg133544
2011-04-11 21:13:10rhettingersetmessages: + msg133543
2011-04-11 21:08:05pitrousetnosy: + pitrou
messages: + msg133542
2011-04-11 14:42:04belopolskysetmessages: + msg133524
2011-04-11 14:29:18belopolskysetnosy: + belopolsky
2011-04-10 20:25:51rhettingercreate