Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve disassembly to show embedded code objects #56031

Closed
rhettinger opened this issue Apr 10, 2011 · 25 comments
Closed

Improve disassembly to show embedded code objects #56031

rhettinger opened this issue Apr 10, 2011 · 25 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 11822
Nosy @rhettinger, @ncoghlan, @abalkin, @pitrou, @vstinner, @Bluehorn, @berkerpeksag, @serhiy-storchaka, @jleedev, @matrixise
PRs
  • bpo-11822: Improve disassembly to show embedded code objects. #1155
  • bpo-11822: Improve disassembly to show embedded code objects (limited depth). #1844
  • Files
  • issue11822.diff
  • issue11822_nested_disassembly.diff: Updated and enhanced for 3.5
  • issue11822_nested_disassembly_with_off_switch.diff: "nested" levels below 0 now turn off the new feature
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-06-11.11:12:25.596>
    created_at = <Date 2011-04-10.20:25:51.850>
    labels = ['3.7', 'type-feature', 'library']
    title = 'Improve disassembly to show embedded code objects'
    updated_at = <Date 2017-06-12.09:14:26.170>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2017-06-12.09:14:26.170>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-06-11.11:12:25.596>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2011-04-10.20:25:51.850>
    creator = 'rhettinger'
    dependencies = []
    files = ['23908', '36565', '36566']
    hgrepos = []
    issue_num = 11822
    keywords = ['patch']
    message_count = 25.0
    messages = ['133478', '133524', '133542', '133543', '133544', '133545', '137415', '149198', '149199', '226525', '226531', '254220', '267425', '270751', '291715', '291718', '291720', '291721', '291765', '291780', '294648', '294650', '295124', '295703', '295761']
    nosy_count = 11.0
    nosy_names = ['rhettinger', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'torsten', 'berker.peksag', 'serhiy.storchaka', 'jleedev', 'matrixise', 'Todd Dembrey']
    pr_nums = ['1155', '1844']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue11822'
    versions = ['Python 3.7']

    @rhettinger
    Copy link
    Contributor Author

    Now that list comprehensions mask run their internals in code objects (the same way that genexps do), it is getting harder to use dis() to see what code is generated. For example, the pow() call isn't shown in the following disassembly:

    >>> dis('[x**2 for x in range(3)]')
      1           0 LOAD_CONST               0 (<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>) 
                  3 MAKE_FUNCTION            0 
                  6 LOAD_NAME                0 (range) 
                  9 LOAD_CONST               1 (3) 
                 12 CALL_FUNCTION            1 
                 15 GET_ITER             
                 16 CALL_FUNCTION            1 
                 19 RETURN_VALUE    

    I propose that dis() build-up a queue undisplayed code objects and then disassemble each of those after the main disassembly is done (effectively making it recursive and displaying code objects in the order that they are first seen in the disassembly). For example, the output shown above would be followed by a disassembly of its internal code object:

    <code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>:
    1 0 BUILD_LIST 0
    3 LOAD_FAST 0 (.0)
    >> 6 FOR_ITER 16 (to 25)
    9 STORE_FAST 1 (x)
    12 LOAD_FAST 1 (x)
    15 LOAD_CONST 0 (2)
    18 BINARY_POWER
    19 LIST_APPEND 2
    22 JUMP_ABSOLUTE 6
    >> 25 RETURN_VALUE

    @rhettinger rhettinger added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 10, 2011
    @abalkin
    Copy link
    Member

    abalkin commented Apr 11, 2011

    Would you like to display lambdas as well?

    >>> dis('lambda x: x**2')
      1           0 LOAD_CONST               0 (<code object <lambda> at 0x1005c9ad0, file "<dis>", line 1>) 
                  3 MAKE_FUNCTION            0 
                  6 RETURN_VALUE         

    <code object <lambda> at 0x1005cb140, file "<dis>", line 1>:
    1 0 LOAD_FAST 0 (x)
    3 LOAD_CONST 1 (2)
    6 BINARY_POWER
    7 RETURN_VALUE

    I like the idea, but would rather see code objects expanded in-line, possibly indented rather than at the end.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2011

    I think it should be enabled with an optional argument. Otherwise in some cases you'll get lots of additional output while you're only interested in the top-level code.

    @rhettinger
    Copy link
    Contributor Author

    If you disassemble a function, you typically want to see all the code in that function. This isn't like pdb where you're choosing to step over or into another function outside the one being viewed.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 11, 2011

    If you disassemble a function, you typically want to see all the code
    in that function.

    That depends on the function. If you do event-driven programming (say,
    Twisted deferreds with addCallback()), you don't necessarily want to see
    the disassembly of the callbacks that are passed to the various
    framework functions. Also, if you do so recursively, it might become
    *very* unwieldy.

    So I don't think there's anything "typical" here. It depends on what you
    intend to focus on.

    @abalkin
    Copy link
    Member

    abalkin commented Apr 11, 2011

    On Mon, Apr 11, 2011 at 5:21 PM, Antoine Pitrou <report@bugs.python.org> wrote:
    ..
    Raymond>> If you disassemble a function, you typically want to see all the code
    Raymond>> [defined] in that function.

    +1 (with clarification in [])

    If the function calls a function defined elsewhere, I don't want to
    see the called function disassembly when I disassemble the caller. In
    this case it is very easy to disassemble interesting functions with
    separate dis() calls. In the case like the following, however:

    def f():
      def g(x):
          return x**2
    dis(f)
    2           0 LOAD_CONST               1 (<code object g at
    0x10055ce88, file "x.py", line 2>)
                  3 MAKE_FUNCTION            0
                  6 STORE_FAST               0 (g)
    ...

    when I see '<code object g at 0x10055ce88, ..>', I have to do
    something unwieldy such as

    3 0 LOAD_FAST 0 (x)
    3 LOAD_CONST 1 (2)
    6 BINARY_POWER
    7 RETURN_VALUE

    That depends on the function. If you do event-driven programming (say,
    Twisted deferreds with addCallback()), you don't necessarily want to see
    the disassembly of the callbacks that are passed to the various
    framework functions. Also, if you do so recursively, it might become
    *very* unwieldy.

    Can you provide some examples of this? Nested functions are typically
    short and even if they are long, the size disassembly would be
    proportional to the line count of the function being disassembled,
    which is expected.

    @rhettinger rhettinger self-assigned this Apr 13, 2011
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jun 1, 2011

    Note that Yaniv Aknin (author of the Python's Innards series of blog posts) has a recursive dis variant that may be useful for inspiration:

    https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798/common/blog.py

    As shown there, this recursive behaviour can also be useful for code_info/show_code.

    @Bluehorn
    Copy link
    Mannequin

    Bluehorn mannequin commented Dec 10, 2011

    I offer the attached patch as a starting point to fulfill this feature request.

    The patch changes the output to insert the disassembly of local code objects on the referencing line.

    As that made the output unreadable to me, I added indentation for the nested code (by 4 spaces, hoping that nobody will nest code 10 levels deep :-)

    This results in the following output for the original example:

    >>> from dis import dis
    >>> dis('[x**2 for x in range(3)]')
      1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f24a67dde40, file "<dis>", line 1>) 
      1               0 BUILD_LIST               0 
                      3 LOAD_FAST                0 (.0) 
                >>    6 FOR_ITER                16 (to 25) 
                      9 STORE_FAST               1 (x) 
                     12 LOAD_FAST                1 (x) 
                     15 LOAD_CONST               0 (2) 
                     18 BINARY_POWER         
                     19 LIST_APPEND              2 
                     22 JUMP_ABSOLUTE            6 
                >>   25 RETURN_VALUE         
                  3 LOAD_CONST               1 ('<listcomp>') 
                  6 MAKE_FUNCTION            0 
                  9 LOAD_NAME                0 (range) 
                 12 LOAD_CONST               2 (3) 
                 15 CALL_FUNCTION            1 
                 18 GET_ITER             
                 19 CALL_FUNCTION            1 
                 22 RETURN_VALUE

    @rhettinger
    Copy link
    Contributor Author

    Thank you :-)

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 7, 2014

    Sorry for the long delay in doing anything with this patch. Unfortunately, trunk has moved on quite a bit since this patch was submitted, and it's no longer directly applicable.

    However, the basic principle is sound, so this is a new patch that aligns with the changes made in 3.4 to provide an iterator based bytecode introspection API. It also changes the indenting to be based on the structure of the bytecode disassembly - nested lines start aligned with the opcode *name* on the preceding line. This will get unreadable with more than two or three levels of nesting, but at that point, hard to read disassembly for the top level function is the least of your worries. (A potentially useful option may to be add a flag to turn off the implicit recursion, easily restoring the old single level behaviour. I'd like the recursive version to be the default though, since it's far more useful given that Python 3 comprehensions all involve a nested code object)

    A descriptive header makes the new output more self-explanatory. Note that I did try repeating the code object repr from the LOAD_CONST opcode in the new header - it was pretty unreadable, and redundant given the preceding line of disassembly.

    Two examples, one showing Torsten's list comprehension from above, and another showing that the nested line numbers work properly.

    This can't be applied as is - it's still missing tests, docs, and fixes to disassembly output tests that assume the old behaviour.

    >>> dis.dis('[x**2 for x in range(3)]')
      1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f459ec4a0c0, file "<dis>", line 1>)
                    Disassembly for nested code object
                      1       0 BUILD_LIST               0
                              3 LOAD_FAST                0 (.0)
                        >>    6 FOR_ITER                16 (to 25)
                              9 STORE_FAST               1 (x)
                             12 LOAD_FAST                1 (x)
                             15 LOAD_CONST               0 (2)
                             18 BINARY_POWER
                             19 LIST_APPEND              2
                             22 JUMP_ABSOLUTE            6
                        >>   25 RETURN_VALUE
                  3 LOAD_CONST               1 ('<listcomp>')
                  6 MAKE_FUNCTION            0
                  9 LOAD_NAME                0 (range)
                 12 LOAD_CONST               2 (3)
                 15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 18 GET_ITER
                 19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 22 RETURN_VALUE
    >>> def f():
    ...     print("Hello")
    ...     def g():
    ...         for x in range(10):
    ...             yield x
    ...     return g
    ... 
    >>> dis.dis(f)
      2           0 LOAD_GLOBAL              0 (print)
                  3 LOAD_CONST               1 ('Hello')
                  6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                  9 POP_TOP

    3 10 LOAD_CONST 2 (<code object g at 0x7f459ec4a540, file "<stdin>", line 3>)
    Disassembly for nested code object
    4 0 SETUP_LOOP 25 (to 28)
    3 LOAD_GLOBAL 0 (range)
    6 LOAD_CONST 1 (10)
    9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
    12 GET_ITER
    >> 13 FOR_ITER 11 (to 27)
    16 STORE_FAST 0 (x)

                  5      19 LOAD_FAST                0 (x)
                         22 YIELD_VALUE
                         23 POP_TOP
                         24 JUMP_ABSOLUTE           13
                    >>   27 POP_BLOCK
                    >>   28 LOAD_CONST               0 (None)
                         31 RETURN_VALUE
             13 LOAD_CONST               3 ('f.<locals>.g')
             16 MAKE_FUNCTION            0
             19 STORE_FAST               0 (g)
    

    6 22 LOAD_FAST 0 (g)
    25 RETURN_VALUE

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 7, 2014

    I didn't want to add a second argument to turn off the new behaviour, so I changed it such that passing a value < 0 for "nested" turns off the new feature entirely. Levels >= 0 enable it, defining which level to start with. The default level is "0" so there's no implied prefix, and nested code objects are displayed by default. This picks up at least comprehensions, lambda expressions and nested functions. I haven't checked how it handles nested classes yet.

    I used this feature to get the old tests passing again by turning off the recursion feature. New tests for the new behaviour are still needed.

    I also tweaked the header to show the *name* of the code object. The full repr is to noisy, but the generic message was hard to read when there were multiple nested code objects.

    @matrixise
    Copy link
    Member

    Hi all,

    For this feature, I have an other output:

    stephane@sg1 /tmp> python3 dump_bytecode.py
    <module>
    --------
    3 0 LOAD_BUILD_CLASS
    1 LOAD_CONST 0 (<code object User at 0x10b830270, file "<show>", line 3>)
    4 LOAD_CONST 1 ('User')
    7 MAKE_FUNCTION 0
    10 LOAD_CONST 1 ('User')
    13 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
    16 STORE_NAME 0 (User)

    8 19 LOAD_NAME 0 (User)
    22 LOAD_CONST 2 ('user')
    25 LOAD_CONST 3 ('password')
    28 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
    31 STORE_NAME 1 (user)
    34 LOAD_CONST 4 (None)
    37 RETURN_VALUE

    <module>.User
    -------------
    3 0 LOAD_NAME 0 (name)
    3 STORE_NAME 1 (module)
    6 LOAD_CONST 0 ('User')
    9 STORE_NAME 2 (qualname)

    4 12 LOAD_CONST 1 (<code object __init__ at 0x10b824270, file "<show>", line 4>)
    15 LOAD_CONST 2 ('User.__init__')
    18 MAKE_FUNCTION 0
    21 STORE_NAME 3 (init)
    24 LOAD_CONST 3 (None)
    27 RETURN_VALUE

    <module>.User.__init__
    ----------------------
    5 0 LOAD_FAST 1 (email)
    3 LOAD_FAST 0 (self)
    6 STORE_ATTR 0 (email)

    6 9 LOAD_FAST 2 (password)
    12 LOAD_FAST 0 (self)
    15 STORE_ATTR 1 (password)
    18 LOAD_CONST 0 (None)
    21 RETURN_VALUE

    @rhettinger rhettinger removed their assignment Nov 10, 2015
    @serhiy-storchaka
    Copy link
    Member

    I like Stéphane's idea about placing the output for nested code object at the same level after the output for the main code object.

    @matrixise
    Copy link
    Member

    hello, we can continue the discussion on this issue ?

    @serhiy-storchaka
    Copy link
    Member

    The issue was open 6 years ago. The feature could be added in 3.3. But it still not implemented.

    Since there are problems with outputting the disassembly of internal code objects expanded in-line, proposed patch just outputs them after the disassembly of the main code object (similar to original Raymond's proposition). This is simpler and doesn't make the output too wide.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Apr 15, 2017
    @ncoghlan
    Copy link
    Contributor

    +1 for listing the nested code objects after the original one.

    In reviewing Serhiy's patch, the core technical implementation looks OK to me, but I think we may want to go with a "depth" argument rather than a simple "recursive" flag.

    My rationale for that relates to directly disassembling module and class source code:

    • dis(module_source, depth=1) # Module, class bodies, function bodies
    • dis(class_source, depth=1) # Class and method bodies

    (with the default depth being 0, to disable recursive descent entirely)

    Only if you set a higher depth than 1 would you start seeing things like closures, comprehension bodies, and nested classes.

    With a simple all-or-nothing flag, I think module level recursive disassembly would be too noisy to be useful.

    The bounded depth approach would also avoid a problem with invalid bytecode manipulations that manage to create a loop between two bytecode objects. While the *compiler* won't do that, there's no guarantee that the disassembler is being fed valid bytecode, so we should avoid exposing ourselves to any infinite loops in the display code.

    @serhiy-storchaka
    Copy link
    Member

    The problem with the *depth* parameter is that it adds a burden of choosing the value for the end user. "Oh, there are more deeper code objects, I must increase the depth and rerun dis()!" I think in most cases when that parameter is specified it would be set to some large value like 999 because you don't want to set it too small. Compare for example with the usage of the attribute maxDiff in unittests.

    The single depth parameter doesn't adds too much control. You can't enable disassembling functions and method bodies but disable disassembling comprehensions in functions. If you need more control, you should use non-recursive dis() and manually walk the tree of code objects.

    How much output adds unlimited recursion in comparison with the recursion limited by the first level?

    As for supporting invalid bytecode, currently the dis module doesn't support it (see bpo-26694).

    @ncoghlan
    Copy link
    Contributor

    The problem I see is that we have conflicting requirements for the default behaviour:

    • if we modify dis() instead of adding a new function, then the default behaviour needs to be non-recursive for backwards compatibility reasons
    • if we allow the depth to be configurable, then we'd like the default behaviour to be to show everything

    One potential resolution to that would be to define this as a new function, distree, rather than modifying dis and disassemble to natively support recursion. Then the new function could accept a depth argument (addressing my concerns), but have depth=None as the default (addressing your concerns).

    If we wanted to allow even more control than that, then I think os.walk provides a useful precedent, where we'd add a new dis.walk helper that just looked at co_consts to find nested code objects without doing any of the other disassembly work.

    The dis.walk() helper would produce an iterable of (depth, code, nested) 3-tuples, where:

    • the first item is the current depth in the compile tree
    • the second is the code object itself
    • the third is a list of nested code objects

    Similar to os.walk(), editing the list of nested objects in place would let you control whether or not any further recursion took place.

    @rhettinger
    Copy link
    Contributor Author

    if we modify dis() instead of adding a new function, then
    the default behaviour needs to be non-recursive for
    backwards compatibility reasons

    I don't see how we have any backward compatibility issues. The dis() function is purely informational like help().

    The problem is it doesn't show important information, list comprehensions are now effectively hidden from everyone who isn't clever and persistent.

    I use dis() as a teaching aid in my Python courses and as a debugging tool when doing consulting. From my point of view, it is effectively broken in Python 3.

    @ncoghlan
    Copy link
    Contributor

    Yeah, I was mixing this up with getargspec (et al), which get used by IDEs and similar tools. While third party tools do use the disassembler, they typically won't use its display logic directly unless they're just dumping the output to a terminal equivalent.

    Given that, a "depth=None" parameter on dis and disassemble would provide the default behaviour of rendering the entire compilation tree, while still allowing turning off recursion entirely ("depth=0"), or limiting it to a desired number of levels ("depth=1", etc).

    @serhiy-storchaka
    Copy link
    Member

    PR 1844 implements Nick's suggestion, but I don't like it. This complicates both the interface and the implementation.

    help() also can produce very noisy output if request the documentation of the whole module. But the output of help() usually is piped through a pager. Would it help if pipe the output of dis() through a pager if the output file and stdin are attached to a terminal?

    @rhettinger
    Copy link
    Contributor Author

    Would it help if pipe the output of dis() through a pager
    if the output file and stdin are attached to a terminal?

    -1 for adding a pager.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 30, 2017
    @serhiy-storchaka
    Copy link
    Member

    Could you please look at the patch Raymond? Is this what you wanted?

    @serhiy-storchaka
    Copy link
    Member

    New changeset 1efbf92 by Serhiy Storchaka in branch 'master':
    bpo-11822: Improve disassembly to show embedded code objects. (bpo-1844)
    1efbf92

    @vstinner
    Copy link
    Member

    Thanks Serhiy, it works and I like the result :-)

    >>> def f():
    ...  def g():
    ...   return 3
    ...  return g
    ... 
    
    >>> import dis; dis.dis(f)
      2           0 LOAD_CONST               1 (<code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>)
                  2 LOAD_CONST               2 ('f.<locals>.g')
                  4 MAKE_FUNCTION            0
                  6 STORE_FAST               0 (g)

    4 8 LOAD_FAST 0 (g)
    10 RETURN_VALUE

    Disassembly of <code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>:
    3 0 LOAD_CONST 1 (3)
    2 RETURN_VALUE

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants