Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the dis module to provide better building blocks for bytecode analysis #56025

Closed
eltoder mannequin opened this issue Apr 10, 2011 · 48 comments
Closed

Refactor the dis module to provide better building blocks for bytecode analysis #56025

eltoder mannequin opened this issue Apr 10, 2011 · 48 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@eltoder
Copy link
Mannequin

eltoder mannequin commented Apr 10, 2011

BPO 11816
Nosy @loewis, @rhettinger, @jcea, @ncoghlan, @abalkin, @pitrou, @vstinner, @alex, @meadori, @ericsnowcurrently, @takluyver, @1st1
Files
  • issue11816_get_opinfo_branch_20111204.diff: Remote diff against python.org after merging from default
  • 5ce60675e572.diff
  • dis_api.diff
  • dis_api2.diff: Updated patch after review
  • dis_api3.diff: + Bytecode docs and tests
  • test_peepholer.diff: use bytecode_helper in test_peepholer
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2013-05-07.09:55:08.281>
    created_at = <Date 2011-04-10.00:20:08.483>
    labels = ['type-feature', 'library']
    title = 'Refactor the dis module to provide better building blocks for bytecode analysis'
    updated_at = <Date 2016-01-19.07:51:22.304>
    user = 'https://github.com/eltoder'

    bugs.python.org fields:

    activity = <Date 2016-01-19.07:51:22.304>
    actor = 'python-dev'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2013-05-07.09:55:08.281>
    closer = 'ncoghlan'
    components = ['Library (Lib)']
    creation = <Date 2011-04-10.00:20:08.483>
    creator = 'eltoder'
    dependencies = []
    files = ['23849', '23853', '29051', '29114', '29694', '29695']
    hgrepos = ['94']
    issue_num = 11816
    keywords = ['patch']
    message_count = 48.0
    messages = ['133437', '133438', '133439', '133441', '133464', '133465', '133466', '133467', '133471', '133479', '133521', '142825', '142831', '143491', '144322', '144400', '144401', '144409', '146694', '148227', '148382', '148383', '148833', '148839', '148840', '148841', '148842', '148859', '163018', '179260', '181970', '181998', '182321', '186148', '188502', '188529', '188536', '188537', '188538', '188539', '188572', '188573', '188574', '188601', '188602', '188604', '188637', '258574']
    nosy_count = 16.0
    nosy_names = ['loewis', 'rhettinger', 'jcea', 'ncoghlan', 'belopolsky', 'pitrou', 'vstinner', 'ron_adam', 'alex', 'rfk', 'meador.inge', 'neologix', 'python-dev', 'eric.snow', 'takluyver', 'yselivanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue11816'
    versions = ['Python 3.4']

    @eltoder
    Copy link
    Mannequin Author

    eltoder mannequin commented Apr 10, 2011

    As discussed in bpo-11549 a couple of tests need to inspect disassembly of some code. Currently they have to override sys.stdout, run dis and restore stdout back. It would be much nicer if dis module provided functions that return disassembly as a string.

    Provided is a patch that adds file argument to most dis functions, defaulting to sys.stdout. On top of that there are 2 new functions: dis_to_str and disassembly_to_str that return disassembly as a string instead of writing it to a file.

    @eltoder eltoder mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 10, 2011
    @rhettinger
    Copy link
    Contributor

    Inspecting the text disassembly is a bit fragile for testing. It would be better to scan a list of (opcode, oparg) pairs for given pattern (i.e. (LOAD_CONST, 3) where consts[3] --> some target value).

    @eltoder
    Copy link
    Mannequin Author

    eltoder mannequin commented Apr 10, 2011

    Agreed, but that would require rewriting of all tests in test_peepholer.

    @rhettinger
    Copy link
    Contributor

    Yep!

    @ncoghlan
    Copy link
    Contributor

    I really like the idea of adding some lower level infrastructure to dis to make it generator based, making the disassembly more amenable to programmatic manipulation.

    Consider if, for each line disassemble() currently prints, we had an underlying iterator that yielded a named tuple consisting of (index, opcode, oparg, linestart, details). I've created a proof-of-concept for that in my sandbox (http://hg.python.org/sandbox/ncoghlan/file/get_opinfo/Lib/dis.py) which adds a get_opinfo() function that does exactly. With disassemble() rewritten to use that, test_dis and test_peepholer still pass as currently written.

    Near-term, test_peepholer could easily continue to do what it does now (i.e. use the higher level dis() function and redirect sys.stdout). Longer term, it could be written to analyse the opcode stream instead of doing string comparisons.

    @ncoghlan
    Copy link
    Contributor

    Changed issue title to cover ideas like get_opinfo().

    @ncoghlan ncoghlan changed the title Add functions to return disassembly as string Refactor the dis module to provide better building blocks for bytecode analysis Apr 10, 2011
    @ncoghlan
    Copy link
    Contributor

    Oops, I forgot to edit my comment to match the OpInfo definition I used in the proof-of-concept:

    OpInfo = collections.namedtuple("OpInfo",
                "opindex opcode opname oparg details starts_line is_jump_target")

    @eltoder
    Copy link
    Mannequin Author

    eltoder mannequin commented Apr 10, 2011

    So in the near term, dis-based tests should continue to copy/paste sys.stdout redirection code?

    @ncoghlan
    Copy link
    Contributor

    If we decide our long term goal is the use of the opcode stream for programmatic access, then yes.

    @alex
    Copy link
    Member

    alex commented Apr 10, 2011

    FWIW in PyPy we have https://bitbucket.org/pypy/pypy/src/default/lib_pypy/disassembler.py which we use for some of our tools.

    @rhettinger rhettinger self-assigned this Apr 10, 2011
    @jcea
    Copy link
    Member

    jcea commented Apr 11, 2011

    Do not forget to update docs too.

    @ncoghlan ncoghlan assigned ncoghlan and unassigned rhettinger Aug 23, 2011
    @rhettinger
    Copy link
    Contributor

    Nick, I still want to work on this one.

    @ncoghlan
    Copy link
    Contributor

    The diff generator didn't work - I've uploaded the current patch manually to make it easier to review than it is in my bitbucket repo.

    I just noticed there's a missing element in the docs patch at the moment - to make testing easier, Ryan added a 'file' argument to the various print-based dis functions so the output can easily be captured in a StringIO object. The docs updates don't currently reflect that, they only cover the OpInfo and get_opinfo additions (along with a clarification of the dis module's slightly odd use of the term 'free').

    Aside from that, the core concept of the patch is pretty simple:

    • add dis.OpInfo and dis.get_opinfo() to make it easier to walk the bytecode programmatically
    • eliminate a lot of the logic duplication inside dis by refactoring more of the operations to internally rely on get_opinfo()
    • add a new test.bytecode_helper.BytecodeTestCase with some convenient assertions for checking code generation
    • update test_peepholer to be independent of the disassembly formatting details
    • add tests for the new features to test_dis (but keep the old detailed formatting tests)

    One potential criticism is the complexity of the 'expected output' for the new OpInfoTestCase, but it seemed worth it to vet the way the new code handles several cases. The programmatic nature makes the opcode sequences much easier to read and maintain than the corresponding formatted output tests would have been.

    These new tests also cover an error that the previous incarnation of the test suite missed completely (I had a bug at one point where I had incorrectly omitted the second half of the list of cell names - there was no test to check that the disassembler handled references to such names correctly)

    @ncoghlan ncoghlan assigned rhettinger and unassigned ncoghlan Aug 23, 2011
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Sep 4, 2011

    Regenerated the get_opinfo patch against current 3.3 tip.

    Still haven't fixed the missing doc updates mentioned in my last message, though.

    @ncoghlan
    Copy link
    Contributor

    Attached patch should now be complete, including the documentation for the new keyword-only 'file' parameter on various dis module functions.

    @meadori
    Copy link
    Member

    meadori commented Sep 22, 2011

    I took a quick look over the final patch (I will do a more thorough
    review later). I like the general idea a lot. The first thing that
    popped out at me are the names 'OpInfo' and 'get_opinfo'.

    'OpInfo' makes it sound like information concerning only the opcode, but
    these objects really represent bytecode instructions. I see a lot
    of code in the future like:

        for opinfo in dis.get_opinfo(thing):
            process(opinfo)

    which seems vague. The following seems clearer to me:

        for instr in dis.bytecode_instructions(thing):
            process(instr)

    And instead of 'OpInfo' perhaps 'ByteCodeInstruction'. Even the current
    'dis' documentation uses the terminology "Byte Code Instruction".

    @ncoghlan
    Copy link
    Contributor

    'Op' is just an abbreviation of 'operation'. So 'operation code' becomes 'opcode' and 'operation information' becomes 'opinfo'. The fact that it comes for the 'dis' module gives the context that the *kind* of operation we're talking about is a Python byte code instruction.

    When people are hacking on bytecode in the future, they'll likely end up using get_opinfo() a fair bit, so swapping the succinct 'opinfo' for the verbose 'bytecode_instruction' strikes me as a poor trade-off.

    @meadori
    Copy link
    Member

    meadori commented Sep 22, 2011

    I agree that 'bytecode_instructions' is a long-winded. FWIW, I
    have worked on or with a fair amount instruction level things and
    "instruction" or "instr" seem to be the established domain terminology.

    Here are a few examples:

    @ncoghlan
    Copy link
    Contributor

    Bitbucket repo and attached patch updated relative to current tip.

    @ncoghlan
    Copy link
    Contributor

    Meador's suggested name change has grown on me, so I plan to switch the name of the new API to "get_instructions()" and the new class to "Instruction".

    @ncoghlan ncoghlan assigned ncoghlan and unassigned rhettinger Nov 24, 2011
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Dec 4, 2011

    Grr, "Create Patch" insists on trying to produce a patch based on https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/9512712044a6.

    That checkin is from *September* and ignores all my recent changes :P

    Relevant meta-tracker issue: http://psf.upfronthosting.co.za/roundup/meta/issue429

    Manual patch upload coming shortly...

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Dec 4, 2011

    OK, manual up-to-date patch attached.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Dec 4, 2011

    @ron: Now that it has a reasonably clear signature, I could see my way clear to making the Instruction._disassemble() method public, which makes it easy for people to compose their own disassembly output.

    For all the other display methods, I prefer Ryan Kelly's suggestion of supporting a "file" argument which is then passed through to the underlying "print()" calls.

    This is inconsistent with what I originally did for the code_info() APIs (where I made a separate "give me the string" function), but it's the lowest impact change that avoids the need to capture stdout.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Dec 5, 2011

    MvL pointed out I hadn't updated the Hg repo reference when I moved my sandbox over to BitBucket - the diff it was generating was from the last time I updated my pydotorg sandbox in order to try something on the buildbots.

    @ncoghlan
    Copy link
    Contributor

    Given the imminent 3.3 beta 1 feature freeze and the fact I would like to explore Ron's suggestion of a higher level ByteCode object to encapsulate a sequence of instructions (along with additional information from the code object), postponing this one.

    @ncoghlan ncoghlan removed their assignment Jun 17, 2012
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 7, 2013

    To clarify the vague allusion in my last comment, Ron's suggestion was along the lines of creating a dis.Bytecode object that encapsulated everything the dis module can figure out about a piece of compiled code.

    That would mean exposing the kind of info reported in a string by dis.code_info() as attributes/properties, and have the proposed "get_opinfo()" be the __iter__ method on the disassembled Bytecode objects.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Feb 12, 2013

    I've updated Nick's patch so that test_dis and test_peephole pass again, and added a prototype ByteCode class (without any docs or tests for now, to allow for API discussion).

    The prototype ByteCode is instantiated with any of the objects that get_instructions already accepts (functions, methods, code strings & code objects). Iterating over it yields Instruction objects. It has info(), show_info() and display_code() methods, which correspond to the code_info(), show_code() and disassemble() functions.

    I've tried to go for names that make sense, rather than names that fit the existing pattern, because the existing pattern feels a bit messy. E.g. the show_code() function doesn't actually show the code, so I've called its method equivalent show_info().

    @ncoghlan
    Copy link
    Contributor

    Thanks Thomas! It's a promising start - a few more detailed comments in the patch review.

    I like the idea of creating the initial version as an object-oriented wrapper around the existing APIs, rather than completely refactoring the module to make everything else a functional wrapper around an underlying object-oriented implementation.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Feb 18, 2013

    Updated version of the patch.

    Changed from review:

    • Included test.bytecode_helper module used by some tests
    • Updated docs to indicate that the changes are new in 3.4
    • ByteCode -> Bytecode
    • Added meaningful repr for Bytecode

    Still to do:

    • ? Re-expose attributes from code object on Bytecode instance
    • Tests & documentation for Bytecode class
    • Split changes into multiple patches

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented Apr 6, 2013

    I've added docs and tests, and split the changes to test_peepholer into a separate patch.

    I haven't re-exposed details of the code object as attributes of Bytecode instances, because they're already available as e.g. bytecode.codeobj.co_names . I think it would be more confusing than useful to offer the same values in two places, though I'm open to discussion on this.

    I've re-organised the dis module docs a bit. I've put Bytecode at the top, as I think it's a more intuitive API than the functions, which have somewhat counter-intuitive names due to the module's history.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented May 6, 2013

    Ping - the latest patches (dis_api3 & test_peepholer) are ready for review when someone's got a moment. Thanks!

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented May 6, 2013

    I created bpo-17916 after realising that the new OO API doesn't yet provide an equivalent to dis.distb that returns an appropriate Bytecode object.

    (I don't think it makes sense to hold up this patch for that change)

    @ncoghlan ncoghlan self-assigned this May 6, 2013
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented May 6, 2013

    Good thing test_peepholer was moved out to a separate patch - a failure of that picked up a bug in the new disassembly output (unifying the handling of name and constant dereferences had changed the way constant strings were reported in the disassembly, and the error was consistent in both the new implementation and in the new tests due to the way the expected test results had been generated)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 6, 2013

    New changeset f65b867ce817 by Nick Coghlan in branch 'default':
    Issue bpo-11816: multiple improvements to the dis module
    http://hg.python.org/cpython/rev/f65b867ce817

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 6, 2013

    New changeset d3fee4c64654 by Nick Coghlan in branch 'default':
    Issue bpo-11816: switch test_peepholer to bytecode_helper
    http://hg.python.org/cpython/rev/d3fee4c64654

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented May 6, 2013

    And two-and-a-bit years later, we're done - thanks all, any further feedback or problems can be filed as a new issue :)

    @ncoghlan ncoghlan closed this as completed May 6, 2013
    @neologix
    Copy link
    Mannequin

    neologix mannequin commented May 6, 2013

    test_dis is failing on some buildbots:

    http://buildbot.python.org/all/builders/AMD64 Ubuntu LTS 3.x/builds/1674/steps/test/logs/stdio

    Re-running test 'test_dis' in verbose mode
    test test_dis crashed -- Traceback (most recent call last):
      File "/opt/python/3.x.langa-ubuntu/build/Lib/test/regrtest.py", line 1294, in runtest_inner
        the_module = importlib.import_module(abstest)
      File "/opt/python/3.x.langa-ubuntu/build/Lib/importlib/__init__.py", line 92, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1603, in _gcd_import
      File "<frozen importlib._bootstrap>", line 1584, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1551, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 591, in _check_name_wrapper
      File "<frozen importlib._bootstrap>", line 1053, in load_module
      File "<frozen importlib._bootstrap>", line 1034, in load_module
      File "<frozen importlib._bootstrap>", line 567, in module_for_loader_wrapper
      File "<frozen importlib._bootstrap>", line 901, in _load_module
      File "<frozen importlib._bootstrap>", line 297, in _call_with_frames_removed
      File "/opt/python/3.x.langa-ubuntu/build/Lib/test/test_dis.py", line 4, in <module>
        from test.bytecode_helper import BytecodeTestCase
    ImportError: No module named 'test.bytecode_helper'

    @neologix neologix mannequin reopened this May 6, 2013
    @pitrou
    Copy link
    Member

    pitrou commented May 6, 2013

    Yes, this is bytecode_helper hasn't been added to the repository.

    @pitrou
    Copy link
    Member

    pitrou commented May 6, 2013

    (this is *because*, sorry)

    @vstinner
    Copy link
    Member

    vstinner commented May 6, 2013

    Ping! The test is still failing.

    @takluyver
    Copy link
    Mannequin

    takluyver mannequin commented May 6, 2013

    bytecode_helper is there in dis_api3.diff - anyone with commit rights should be able to add it to the repository.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 6, 2013

    New changeset 84d1a0e32d3b by Nick Coghlan in branch 'default':
    Issue bpo-11816: Add missing test helper
    http://hg.python.org/cpython/rev/84d1a0e32d3b

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented May 7, 2013

    I checked in the missing file after I woke up this morning. Maybe I'll learn to use hg import instead of patch some day...

    Sorry for the noise.

    @ncoghlan ncoghlan closed this as completed May 7, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 19, 2016

    New changeset bf997b22df06 by Victor Stinner in branch '3.5':
    Fix BytecodeTestCase.assertNotInBytecode()
    https://hg.python.org/cpython/rev/bf997b22df06

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants