This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Disassembly - improve documentation for bytecode instruction class and set source line no. attribute for every instruction
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, smurthy
Priority: normal Keywords:

Created on 2020-03-02 09:00 by smurthy, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg363140 - (view) Author: S Murthy (smurthy) * Date: 2020-03-02 09:00
I note that on disassembling a piece of source code (via source strings or code objects) the corresponding sequence of bytecode instruction objects (https://docs.python.org/3/library/dis.html#dis.Instruction) do not always have the `starts_line` attribute set - the storage and display of this line no. seems to be based on whether a given instruction is the first in a block of instructions which implement a given source line.

I think it would be better, for mapping source and logical lines of code to bytecode instruction blocks, to set `starts_line` for every instruction, and amend the bytecode printing method (`dis._disassemble_bytes`) to keep the existing behaviour by detecting whether an instruction is the first line of an instruction block.

ATM `Instruction` objects are created and generated within this loop in `dis._get_bytecode_instructions`:

def _get_instructions_bytes(code, varnames=None, names=None, constants=None,
                      cells=None, linestarts=None, line_offset=0):
    """Iterate over the instructions in a bytecode string.

    Generates a sequence of Instruction namedtuples giving the details of each
    opcode.  Additional information about the code's runtime environment
    (e.g. variable names, constants) can be specified using optional
    arguments.

    """
    labels = findlabels(code)
    starts_line = None
    for offset, op, arg in _unpack_opargs(code):
        if linestarts is not None:
            starts_line = linestarts.get(offset, None)
         ...
         ...

So it's this line

            starts_line = linestarts.get(offset, None)

which currently causes `starts_line` to be to set to `None` for every instruction which isn't the first in an instruction block - linestarts is a dict of source line numbers and offsets of the first instructions starting the corresponding instruction blocks.

My idea is to (1) change that line above to
 
            starts_line = linestarts.get(offset, starts_line)

which ensures every instruction will have the corresponding source line no. set, (2) amend `Instruction._disassemble` to add a new optional argument `print_start_line` with default of `True` to determine whether to print the source line no., and (3) amend `dis._disassemble_bytes` to accept a new optional argument `start_line_by_block` with a default of `True` which can be used to preserve existing behaviour of printing source line numbers by instruction block.

I was wondering whether this sounds OK, if so, I am happy to submit a PR.
msg363141 - (view) Author: S Murthy (smurthy) * Date: 2020-03-02 09:10
Docstring for `Instruction` should be more precise. ATM the description of `starts_line` is

    line started by this opcode (if any), otherwise None

I think "source line started ..." would be a bit more precise and accurate.
msg410652 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-15 17:26
I don't think this change should be made - it would generate the same information in a slightly different format, which will break existing code while not making it possible to do anything we can't do now.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 84004
2022-01-15 17:26:52iritkatrielsetnosy: + iritkatriel

messages: + msg410652
versions: + Python 3.11, - Python 3.7
2020-03-02 09:10:47smurthysetmessages: + msg363141
2020-03-02 09:00:46smurthycreate