New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve disassembly to show embedded code objects #56031
Comments
Now that list comprehensions mask run their internals in code objects (the same way that genexps do), it is getting harder to use dis() to see what code is generated. For example, the pow() call isn't shown in the following disassembly: >>> dis('[x**2 for x in range(3)]')
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>)
3 MAKE_FUNCTION 0
6 LOAD_NAME 0 (range)
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 1
15 GET_ITER
16 CALL_FUNCTION 1
19 RETURN_VALUE I propose that dis() build-up a queue undisplayed code objects and then disassemble each of those after the main disassembly is done (effectively making it recursive and displaying code objects in the order that they are first seen in the disassembly). For example, the output shown above would be followed by a disassembly of its internal code object: <code object <listcomp> at 0x1005d1e88, file "<dis>", line 1>: |
Would you like to display lambdas as well? >>> dis('lambda x: x**2')
1 0 LOAD_CONST 0 (<code object <lambda> at 0x1005c9ad0, file "<dis>", line 1>)
3 MAKE_FUNCTION 0
6 RETURN_VALUE <code object <lambda> at 0x1005cb140, file "<dis>", line 1>: I like the idea, but would rather see code objects expanded in-line, possibly indented rather than at the end. |
I think it should be enabled with an optional argument. Otherwise in some cases you'll get lots of additional output while you're only interested in the top-level code. |
If you disassemble a function, you typically want to see all the code in that function. This isn't like pdb where you're choosing to step over or into another function outside the one being viewed. |
That depends on the function. If you do event-driven programming (say, So I don't think there's anything "typical" here. It depends on what you |
On Mon, Apr 11, 2011 at 5:21 PM, Antoine Pitrou <report@bugs.python.org> wrote: +1 (with clarification in []) If the function calls a function defined elsewhere, I don't want to def f():
def g(x):
return x**2
dis(f)
2 0 LOAD_CONST 1 (<code object g at
0x10055ce88, file "x.py", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (g)
... when I see '<code object g at 0x10055ce88, ..>', I have to do 3 0 LOAD_FAST 0 (x)
Can you provide some examples of this? Nested functions are typically |
Note that Yaniv Aknin (author of the Python's Innards series of blog posts) has a recursive dis variant that may be useful for inspiration: https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798/common/blog.py As shown there, this recursive behaviour can also be useful for code_info/show_code. |
I offer the attached patch as a starting point to fulfill this feature request. The patch changes the output to insert the disassembly of local code objects on the referencing line. As that made the output unreadable to me, I added indentation for the nested code (by 4 spaces, hoping that nobody will nest code 10 levels deep :-) This results in the following output for the original example: >>> from dis import dis
>>> dis('[x**2 for x in range(3)]')
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f24a67dde40, file "<dis>", line 1>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
3 LOAD_CONST 1 ('<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1
18 GET_ITER
19 CALL_FUNCTION 1
22 RETURN_VALUE |
Thank you :-) |
Sorry for the long delay in doing anything with this patch. Unfortunately, trunk has moved on quite a bit since this patch was submitted, and it's no longer directly applicable. However, the basic principle is sound, so this is a new patch that aligns with the changes made in 3.4 to provide an iterator based bytecode introspection API. It also changes the indenting to be based on the structure of the bytecode disassembly - nested lines start aligned with the opcode *name* on the preceding line. This will get unreadable with more than two or three levels of nesting, but at that point, hard to read disassembly for the top level function is the least of your worries. (A potentially useful option may to be add a flag to turn off the implicit recursion, easily restoring the old single level behaviour. I'd like the recursive version to be the default though, since it's far more useful given that Python 3 comprehensions all involve a nested code object) A descriptive header makes the new output more self-explanatory. Note that I did try repeating the code object repr from the LOAD_CONST opcode in the new header - it was pretty unreadable, and redundant given the preceding line of disassembly. Two examples, one showing Torsten's list comprehension from above, and another showing that the nested line numbers work properly. This can't be applied as is - it's still missing tests, docs, and fixes to disassembly output tests that assume the old behaviour. >>> dis.dis('[x**2 for x in range(3)]')
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f459ec4a0c0, file "<dis>", line 1>)
Disassembly for nested code object
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 16 (to 25)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LOAD_CONST 0 (2)
18 BINARY_POWER
19 LIST_APPEND 2
22 JUMP_ABSOLUTE 6
>> 25 RETURN_VALUE
3 LOAD_CONST 1 ('<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 RETURN_VALUE
>>> def f():
... print("Hello")
... def g():
... for x in range(10):
... yield x
... return g
...
>>> dis.dis(f)
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_CONST 1 ('Hello')
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP 3 10 LOAD_CONST 2 (<code object g at 0x7f459ec4a540, file "<stdin>", line 3>)
6 22 LOAD_FAST 0 (g) |
I didn't want to add a second argument to turn off the new behaviour, so I changed it such that passing a value < 0 for "nested" turns off the new feature entirely. Levels >= 0 enable it, defining which level to start with. The default level is "0" so there's no implied prefix, and nested code objects are displayed by default. This picks up at least comprehensions, lambda expressions and nested functions. I haven't checked how it handles nested classes yet. I used this feature to get the old tests passing again by turning off the recursion feature. New tests for the new behaviour are still needed. I also tweaked the header to show the *name* of the code object. The full repr is to noisy, but the generic message was hard to read when there were multiple nested code objects. |
Hi all, For this feature, I have an other output: stephane@sg1 /tmp> python3 dump_bytecode.py 8 19 LOAD_NAME 0 (User) <module>.User 4 12 LOAD_CONST 1 (<code object __init__ at 0x10b824270, file "<show>", line 4>) <module>.User.__init__ 6 9 LOAD_FAST 2 (password) |
I like Stéphane's idea about placing the output for nested code object at the same level after the output for the main code object. |
hello, we can continue the discussion on this issue ? |
The issue was open 6 years ago. The feature could be added in 3.3. But it still not implemented. Since there are problems with outputting the disassembly of internal code objects expanded in-line, proposed patch just outputs them after the disassembly of the main code object (similar to original Raymond's proposition). This is simpler and doesn't make the output too wide. |
+1 for listing the nested code objects after the original one. In reviewing Serhiy's patch, the core technical implementation looks OK to me, but I think we may want to go with a "depth" argument rather than a simple "recursive" flag. My rationale for that relates to directly disassembling module and class source code:
(with the default depth being 0, to disable recursive descent entirely) Only if you set a higher depth than 1 would you start seeing things like closures, comprehension bodies, and nested classes. With a simple all-or-nothing flag, I think module level recursive disassembly would be too noisy to be useful. The bounded depth approach would also avoid a problem with invalid bytecode manipulations that manage to create a loop between two bytecode objects. While the *compiler* won't do that, there's no guarantee that the disassembler is being fed valid bytecode, so we should avoid exposing ourselves to any infinite loops in the display code. |
The problem with the *depth* parameter is that it adds a burden of choosing the value for the end user. "Oh, there are more deeper code objects, I must increase the depth and rerun dis()!" I think in most cases when that parameter is specified it would be set to some large value like 999 because you don't want to set it too small. Compare for example with the usage of the attribute maxDiff in unittests. The single depth parameter doesn't adds too much control. You can't enable disassembling functions and method bodies but disable disassembling comprehensions in functions. If you need more control, you should use non-recursive dis() and manually walk the tree of code objects. How much output adds unlimited recursion in comparison with the recursion limited by the first level? As for supporting invalid bytecode, currently the dis module doesn't support it (see bpo-26694). |
The problem I see is that we have conflicting requirements for the default behaviour:
One potential resolution to that would be to define this as a new function, If we wanted to allow even more control than that, then I think os.walk provides a useful precedent, where we'd add a new The dis.walk() helper would produce an iterable of (depth, code, nested) 3-tuples, where:
Similar to os.walk(), editing the list of nested objects in place would let you control whether or not any further recursion took place. |
I don't see how we have any backward compatibility issues. The dis() function is purely informational like help(). The problem is it doesn't show important information, list comprehensions are now effectively hidden from everyone who isn't clever and persistent. I use dis() as a teaching aid in my Python courses and as a debugging tool when doing consulting. From my point of view, it is effectively broken in Python 3. |
Yeah, I was mixing this up with getargspec (et al), which get used by IDEs and similar tools. While third party tools do use the disassembler, they typically won't use its display logic directly unless they're just dumping the output to a terminal equivalent. Given that, a "depth=None" parameter on |
PR 1844 implements Nick's suggestion, but I don't like it. This complicates both the interface and the implementation. help() also can produce very noisy output if request the documentation of the whole module. But the output of help() usually is piped through a pager. Would it help if pipe the output of dis() through a pager if the output file and stdin are attached to a terminal? |
-1 for adding a pager. |
Could you please look at the patch Raymond? Is this what you wanted? |
Thanks Serhiy, it works and I like the result :-) >>> def f():
... def g():
... return 3
... return g
...
>>> import dis; dis.dis(f)
2 0 LOAD_CONST 1 (<code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>)
2 LOAD_CONST 2 ('f.<locals>.g')
4 MAKE_FUNCTION 0
6 STORE_FAST 0 (g) 4 8 LOAD_FAST 0 (g) Disassembly of <code object g at 0x7f16ab2e2c40, file "<stdin>", line 2>: |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: