This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Change line number table format
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Mark.Shannon, brandtbucher, corona10, nedbat, pablogsal, serhiy.storchaka, skip.montanaro
Priority: normal Keywords: patch

Created on 2020-02-03 10:06 by Mark.Shannon, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18326 closed Mark.Shannon, 2020-02-03 10:38
Messages (6)
msg361277 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-02-03 10:06
The current line number table format has two issues that need to be addressed.

1. There is no way to express that a bytecode does not have have a line number. The `END_ASYNC_FOR` bytecode, bytecodes for cleaning up the variable used to store exceptions in exception handles, and a few other cases, are all artificial and should have no line number.

2. It is inefficient to find a line number when tracing.
Currently, whenever the line number changes, the line number table must be re-scanned from the the start.

I propose to fix this by implementing a new line number table. 
Each instruction (currently pair of bytes) would have a one byte line-offset value. An offset of 0 indicates that the instruction has no line number.

In addition to the offset table there would be a table of bytecode-offset, base-line pairs. Following the pairs is the instruction count.
Adding the instruction count at the end means that the table is not just a table of start, line pairs, but also a table of (inclusive) start, line, (exclusive) end triples. This format makes it very easy to scan forwards and backwards.
Because each entry covers up to 255 lines, the table is very small.

The line of the bytecode at `n*2` (instruction `n`) is calculated as:

offset = lnotab[n]
if offset == 0:
    line = -1 # artificial
    line_base = scan_table_to_find(n)
    line = offset + line_base

The new format fixes the two issues listed above.
1. Having no line number is expressed by a 0 in the offset table.
2. Since the offset-base table is made up of absolute values, not relative ones, it can be reliably scanned backwards. It is even possible to use a binary search, although a linear scan will be faster in almost all cases. 

The number format would be larger than the old one. 
However, the code object is composed not only of code, but several tuples of names and constants as well, so increasing the size of the line number has a small effect overall.
msg361344 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-04 10:57
I planned to add a table for end line numbers, so every instruction will be associated with a range of lines.

For example, in

  x = {

the BUILD_SET instruction will be associated with lines 1-4. Currently it is associated only with line 1, and was associated with line 3 in 3.7.


  x = (

the BINARY_ADD instruction will be associated with lines 2-4. Currently it is associated with line 2, and was associated with line 4 in 3.7. The proposition about associating it with line 3 (containing the "+" operator) was rejected before.


  x = (

the LOAD_ATTR instruction will be associated with lines 2-4. Currently it is associated with line 2. The AST does not contain information about lines of "." and "b" tokens, but it contains information about the start and the end positions of "a.b".

This will help tools like which are confused be executing the same line several times and may allow to generate more informative tracebacks.

As for END_ASYNC_FOR, it will be associated with the range of the "async for" statement.

Maybe we could use more efficient format for encoding start and end line numbers.
msg361502 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-02-06 19:59
How would you handle bytecodes that have no line number?
msg362013 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-02-15 11:38
Every bytecode is generated from some AST node which has an associated range of lines. So there should not be bytecodes that have no line number.

If for some reasons you generate bytecodes that have no line number, you can just set lineno > end_lineno for them.
msg362407 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-02-21 14:33

Although the code generator is syntax directed, not all code has an explicit piece of syntax attached.

For example in the following code:
def foo():
    if x:
the compiler emits code to return from the function (LOAD_CONST None; RETURN_VALUE), but there is no explicit return, and no meaningful line number for the return.

Also consider, the artificial try-except block generated for async for loops and the cleanup code for named exception variables.
msg390409 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-04-07 09:27
PEP 626 fixed this
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83718
2021-04-07 09:28:07Mark.Shannonsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-04-07 09:27:59Mark.Shannonsetmessages: + msg390409
2020-04-03 17:02:10corona10setnosy: + corona10
2020-02-21 14:33:04Mark.Shannonsetmessages: + msg362407
2020-02-20 22:40:57brandtbuchersetnosy: + brandtbucher
2020-02-15 11:38:28serhiy.storchakasetmessages: + msg362013
2020-02-14 21:38:25BTaskayasetnosy: + BTaskaya
2020-02-06 19:59:04Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg361502
2020-02-04 10:57:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg361344
2020-02-03 10:38:00Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17699
2020-02-03 10:07:05Mark.Shannonsetnosy: + skip.montanaro, nedbat, pablogsal, - Mark.Shannon
2020-02-03 10:06:34Mark.Shannoncreate