Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer #70295

Closed
vstinner opened this issue Jan 14, 2016 · 22 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

BPO 26107
Nosy @gvanrossum, @brettcannon, @rhettinger, @vstinner, @markshannon, @serhiy-storchaka, @ztane
Files
  • lnotab.patch
  • lnotab-2.patch
  • lnotab-3.patch
  • lnotab-4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-01-20.11:34:01.939>
    created_at = <Date 2016-01-14.09:12:43.356>
    labels = ['interpreter-core', 'type-feature']
    title = 'PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer'
    updated_at = <Date 2016-01-27.11:20:21.473>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2016-01-27.11:20:21.473>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-01-20.11:34:01.939>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2016-01-14.09:12:43.356>
    creator = 'vstinner'
    dependencies = []
    files = ['41613', '41645', '41646', '41652']
    hgrepos = []
    issue_num = 26107
    keywords = ['patch']
    message_count = 22.0
    messages = ['258189', '258190', '258191', '258197', '258513', '258514', '258515', '258520', '258522', '258523', '258527', '258529', '258547', '258549', '258552', '258556', '258628', '258668', '258669', '258672', '258760', '259015']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'brett.cannon', 'rhettinger', 'vstinner', 'Mark.Shannon', 'python-dev', 'serhiy.storchaka', 'ztane']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26107'
    versions = ['Python 3.6']

    @vstinner
    Copy link
    Member Author

    Python doesn't store the original line number in the .pyc file in the bytecode. Instead, an efficient table is used to find the line number from the current in the bytecode: code.co_lnotab.

    Basically, it's a list of (offset_delta, line_number_delta) pairs where offset_delta and line_number_delta are unsigned 8 bits numbers. If an offset delta is larger than 255, (offset_delta % 255, line_number_delta) and (offset_delta // 255, 0) pairs are emited. Same for line_number_delta. (In fact, more than two pairs can be created.)

    The format is described in Objects/lnotab_notes.txt.

    I implemented an optimizer which can generate *negative* line number. For example, the loop:

       for i in range(2):   # line 1
          print(i)          # line 2

    is replaced with:

       i = 0      # line 1
       print(i)   # line 2
       i = 1      # line 1
       print(i)   # line 2

    The third instruction has a negative line number delta.

    I'm not the first one hitting the issue, but it's just that no one proposed a patch before. Previous projects bitten by this issue:

    • issue bpo-10399: "AST Optimization: inlining of function calls"
    • issue bpo-11549: "Build-out an AST optimizer, moving some functionality out of the peephole optimizer"

    Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before).

    My code in Lib/dis.py is inefficient. Maybe unpack the full lnotab than *then* skip half of the bytes? (instead of calling struct.unpack times for each byte).

    The patch adds also "assert(Py_REFCNT(lnotab_obj) == 1);" to PyCode_Optimize(). The assertion never fails, but it's just to be extra safe.

    The patch renames variables in PyCode_Optimize() because I was confused between "offset" and "line numbers". IMHO variables were badly named.

    I changed the MAGIC_NUMBER of importlib, but it was already changed for f-string:

    # Python 3.6a0 3360 (add FORMAT_VALUE opcode bpo-25483)

    Is it worth to modify it again?

    You may have to recompile Python/importlib_external.h if it's not recompiled automatically (just touch the file before running make).

    Note: this issue is related to the PEP-511 (the PEP is not ready for a review, but it gives a better overview of the use cases.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 14, 2016

    New changeset c4a826184937 by Victor Stinner in branch 'default':
    PEP-511: add link to issue bpo-26107
    https://hg.python.org/peps/rev/c4a826184937

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 14, 2016
    @vstinner
    Copy link
    Member Author

    Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before).

    The main visible change is that .pyc files can be a little bit larger and so use more disk space. Well... in fact only .pyc of files using line number delta larger than 127.

    @serhiy-storchaka
    Copy link
    Member

    A patch was proposed in bpo-16956. And bpo-17611 is related.

    @vstinner vstinner changed the title code.co_lnotab: use signed line number delta to support moving instructions in an optimizer PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer Jan 18, 2016
    @vstinner
    Copy link
    Member Author

    Patch version 2 to take Serhiy's review in account:

    • complete Objects/lnotab_notes.txt update. Replace (300, 300) delta example with (300, 200) delta for better readability (300 is the bytecode offset delta, 200 is the line number delta)
    • fix added assertion in peephole: don't check the reference counter for empty byte string (which can be the empty byte string singleton)
    • restore addrmap name in peephole
    • don't change importlib MAGIC: we only change it between Python minor versions, it was already changed for Python 3.6a0
    • assemble_lnotab() divide with positive numbers to avoid undefined behaviour on C
    • dis.py: use "if line_incr >= 0x80: line_incr -= 0x100" instead of struct.unpack() to convert unsigned to signed

    Note: avoid also useless "if (x != NULL)" checks before calling PyMem_Free(). PyMem_Free(NULL) is well specified: do nothing.

    @vstinner
    Copy link
    Member Author

    Patch version 3:

    • When I reviewed issue bpo-16956 patch, I noticed that I forgot to extract my changes on codeobject.c :-/ (FAT Python became a giant patch, it's hard to browse it!)
    • Fix typo in dis.py (regression of patch 2)

    @vstinner
    Copy link
    Member Author

    A patch was proposed in bpo-16956. And bpo-17611 is related.

    I don't see directly the link between this issue and the bpo-17611, but cool if it helps to implement new optimizations :-)

    I compared my patch with bpo-16956 patch:

    Additionally, my patch uses better names in the peephole optimizer, but it's not directly related to the issue. By the way, this change should be commited in a separated patch.

    I prefer to push my recent. By the way, it's up to date, whereas bpo-16956 patch requires a rebase.

    @serhiy-storchaka
    Copy link
    Member

    Yes, you patch supersedes bpo-16956 patch.

    Added new comments on Rietveld for lnotab_notes.txt.

    I afraid this patch can cause problems with code tracing where it is assumed that lines are increased monotonically and *instr_lb <= frame->f_lasti < *instr_ub. We should carefully analyze the effect of the patch on the tracing.

    Before committing you must ask Guido for approval. AFAIK his had objections against code transformations that make debugging harder.

    @vstinner
    Copy link
    Member Author

    We have many unit test in the Python test suite which rely on exact line numbers. Examples:

    • test_sys_settrace
    • test_pdb
    • test_dis

    I know them because they were all broken when my fatoptimizer project had bugs related to line numbers :-)

    With my patch, the full Python test suite pass whereas my patch doesn't modify any test.

    I afraid this patch can cause problems with code tracing where it is assumed that lines are increased monotonically and *instr_lb <= frame->f_lasti < *instr_ub. We should carefully analyze the effect of the patch on the tracing.

    First, my patch has no impact on frame->f_lasti.

    The trace module and test_sys_settrace use frame.f_lineno which PyFrame_GetLineNumber(). This function returns f->f_lineno if the frame has a trace function, or PyCode_Addr2Line(). PyCode_Addr2Line() and PyFrame_GetLineNumber() still work with my patch.

    When you trace a program, "negative line delta" and "negative instruction offset" are not new in Python: it's a basic requirement to support loop, when you compare two instructions seen by the tracer.

    To be clear, my patch does *not* introduce negative line number delta in co_lnotab. It only *adds support* for negative line number delta. If a tool decodes co_lnotab using 8-bit unsigned number for line number delta, the tool still works even with the patch. It only starts to return wrong line numbers if you start debugging a program which has negative line numbers.

    If you use fatoptimizer, you get such negative delta. But if you use an optimizer, you should be prepared to some subtle differences. The good practice is to disable all optimizers when you debug code. It's also really hard (or impossible) to debug C code optimized with -O3. I always use gcc -O0 to debug CPython.

    Before committing you must ask Guido for approval. AFAIK his had objections against code transformations that make debugging harder.

    Are you aware of tools decoding directly co_lnotab?

    Oh, I forgot the old Misc/gdbinit script which *does* decode directly co_lnotab. Does anyone still use it? If yes, it should also be updated.

    I failed to reproduce the bug with Misc/gdbinit, beacuse bug only occurs if you debug a program which uses negative line number, and CPython doesn't produce negative line number in co_lnotab by default... So it would be "nice" to also support negative line number in Misc/gdbinit, but maybe it's ok to let this old script dying? :-D

    @vstinner
    Copy link
    Member Author

    Are you aware of tools decoding directly co_lnotab?

    Ah! I found Ned Batchelder's coverage project which has a _bytes_lines() method "adapted from dis.py in the standard library". The method uses directly co_lnotab to compute line numbers.

    Ok, *this project* will have to be updated if it wants to support fatoptimizer and other code transformers producing negative line numbers.

    Maybe I can contribute to it with a patch if my change to CPython 3.6 is accepted ;-)

    @brettcannon
    Copy link
    Member

    I just wanted to comment on "don't change importlib MAGIC: we only change it between Python minor versions": that's actually not true. Feel free to up the number whenever you make a change that affects eval.c or bytecode. Otherwise .pyc files won't be regenerated. And that number is cheap anyway and isn't about to run out, so don't worry about updating it multiple times before the code sees a public release.

    @vstinner
    Copy link
    Member Author

    Brett Cannon added the comment:

    I just wanted to comment on "don't change importlib MAGIC: we only change it between Python minor versions": that's actually not true. Feel free to up the number whenever you make a change that affects eval.c or bytecode. Otherwise .pyc files won't be regenerated. And that number is cheap anyway and isn't about to run out, so don't worry about updating it multiple times before the code sees a public release.

    Since my patch may break setup of multiple python developers, it can
    be worth to increase this number, ok :-)

    @serhiy-storchaka
    Copy link
    Member

    But there is no need to increase it by 10. I suppose the gap is added to allow updating bytecode in maintained releases, but in process of developing next version we don't need this.

    @brettcannon
    Copy link
    Member

    There's technically no need to worry about ranged values as the magic number is purely an equality check to see if the interpreter matches what the .pyc was created with. I guess there might be third-party code that does a range check, but that's bad as importlib checks the raw bytes only; using a number is mostly a convenience for changing it.

    @serhiy-storchaka
    Copy link
    Member

    The launcher on Windows does a range check.

    @vstinner
    Copy link
    Member Author

    Patch version 4:

    • finish to update Objects/lnotab_notes.txt
    • update _PyCode_CheckLineNumber() in codeobject.c
    • set importlib MAGIC to 3361

    I don't expect my patch to be complete nor perfect. IMHO it's fine to adjust the code later if needed.

    I would like to integrate FAT Python changes step by step. It looks like the general idea of AST optimizers is well accepted.

    @ztane
    Copy link
    Mannequin

    ztane mannequin commented Jan 19, 2016

    Nice work, my bpo-21385 is also related. Basically, transforming non-Python code into Python meant that all line number information, which otherwise would have been useful for debugging, had to be discarded, or debug builds of Python would dump cores.

    So, bye "assert(d_lineno >= 0);", you won't be missed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 20, 2016

    New changeset 775b74e0e103 by Victor Stinner in branch 'default':
    co_lnotab supports negative line number delta
    https://hg.python.org/cpython/rev/775b74e0e103

    @vstinner
    Copy link
    Member Author

    Nice work, my bpo-21385 is also related. Basically, transforming non-Python code into Python meant that all line number information, which otherwise would have been useful for debugging, had to be discarded, or debug builds of Python would dump cores.

    Ok, it looks like there are multiple use cases for negative line numbers, and the change doesn't really break anything in practice. I tried to explain exactly who is impacted and how to update the code in the Porting section of What's New in Python 3.6.

    For each review Serhiy. I pushed the the change to Python 3.6.

    @vstinner
    Copy link
    Member Author

    I closed issues bpo-16956 and bpo-21385 as duplicates of this issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 21, 2016

    New changeset c6fb1651ea2e by Victor Stinner in branch 'default':
    Issue bpo-26107: Fix typo in Objects/lnotab_notes.txt
    https://hg.python.org/cpython/rev/c6fb1651ea2e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 27, 2016

    New changeset 16f60cd918e0 by Victor Stinner in branch 'default':
    PEP-511
    https://hg.python.org/peps/rev/16f60cd918e0

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants