classification
Title: PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, brett.cannon, gvanrossum, python-dev, rhettinger, serhiy.storchaka, vstinner, ztane
Priority: normal Keywords: patch

Created on 2016-01-14 09:12 by vstinner, last changed 2016-01-27 11:20 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
lnotab.patch vstinner, 2016-01-14 09:12 review
lnotab-2.patch vstinner, 2016-01-18 10:53 review
lnotab-3.patch vstinner, 2016-01-18 11:00 review
lnotab-4.patch vstinner, 2016-01-18 22:31 review
Messages (22)
msg258189 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-14 09:12
Python doesn't store the original line number in the .pyc file in the bytecode. Instead, an efficient table is used to find the line number from the current in the bytecode: code.co_lnotab.

Basically, it's a list of (offset_delta, line_number_delta) pairs where offset_delta and line_number_delta are unsigned 8 bits numbers. If an offset delta is larger than 255, (offset_delta % 255, line_number_delta) and (offset_delta // 255, 0) pairs are emited. Same for line_number_delta. (In fact, more than two pairs can be created.)

The format is described in Objects/lnotab_notes.txt.

I implemented an optimizer which can generate *negative* line number. For example, the loop:

   for i in range(2):   # line 1
      print(i)          # line 2

is replaced with:

   i = 0      # line 1
   print(i)   # line 2
   i = 1      # line 1
   print(i)   # line 2

The third instruction has a negative line number delta.

I'm not the first one hitting the issue, but it's just that no one proposed a patch before. Previous projects bitten by this issue:

* issue #10399: "AST Optimization: inlining of function calls"
* issue #11549: "Build-out an AST optimizer, moving some functionality out of the peephole optimizer"

Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before).

My code in Lib/dis.py is inefficient. Maybe unpack the full lnotab than *then* skip half of the bytes? (instead of calling struct.unpack times for each byte).

The patch adds also "assert(Py_REFCNT(lnotab_obj) == 1);" to PyCode_Optimize(). The assertion never fails, but it's just to be extra safe.

The patch renames variables in PyCode_Optimize() because I was confused between "offset" and "line numbers". IMHO variables were badly named.

I changed the MAGIC_NUMBER of importlib, but it was already changed for f-string:

#     Python 3.6a0  3360 (add FORMAT_VALUE opcode #25483)

Is it worth to modify it again?

You may have to recompile Python/importlib_external.h if it's not recompiled automatically (just touch the file before running make).

Note: this issue is related to the PEP 511 (the PEP is not ready for a review, but it gives a better overview of the use cases.)
msg258190 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-14 09:15
New changeset c4a826184937 by Victor Stinner in branch 'default':
PEP 511: add link to issue #26107
https://hg.python.org/peps/rev/c4a826184937
msg258191 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-14 09:20
> Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before).

The main visible change is that .pyc files can be a little bit larger and so use more disk space. Well... in fact only .pyc of files using line number delta larger than 127.
msg258197 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-14 12:32
A patch was proposed in issue16956. And issue17611 is related.
msg258513 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 10:53
Patch version 2 to take Serhiy's review in account:

* complete Objects/lnotab_notes.txt update. Replace (300, 300) delta example with (300, 200) delta for better readability (300 is the bytecode offset delta, 200 is the line number delta)
* fix added assertion in peephole: don't check the reference counter for empty byte string (which can be the empty byte string singleton)
* restore addrmap name in peephole
* don't change importlib MAGIC: we only change it between Python minor versions, it was already changed for Python 3.6a0
* assemble_lnotab() divide with positive numbers to avoid undefined behaviour on C
* dis.py: use "if line_incr >= 0x80: line_incr -= 0x100" instead of struct.unpack() to convert unsigned to signed

Note: avoid also useless "if (x != NULL)" checks before calling PyMem_Free(). PyMem_Free(NULL) is well specified: do nothing.
msg258514 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 11:00
Patch version 3:

* When I reviewed issue #16956 patch, I noticed that I forgot to extract my changes on codeobject.c :-/ (FAT Python became a giant patch, it's hard to browse it!)
* Fix typo in dis.py (regression of patch 2)
msg258515 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 11:06
> A patch was proposed in issue16956. And issue17611 is related.

I don't see directly the link between this issue and the issue17611, but cool if it helps to implement new optimizations :-)

I compared my patch with issue16956 patch:

* my patch mentions also the change in Lib/importlib/_bootstrap_external.py
* my patch updates also dis.py
* my patch updates also Objects/lnotab_notes.txt
* issue16956 patch changes compiler_while(), this change is not directly related to making line number delta signed

Additionally, my patch uses better names in the peephole optimizer, but it's not directly related to the issue. By the way, this change should be commited in a separated patch.

I prefer to push my recent. By the way, it's up to date, whereas issue16956 patch requires a rebase.
msg258520 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-18 11:57
Yes, you patch supersedes issue16956 patch.

Added new comments on Rietveld for lnotab_notes.txt.

I afraid this patch can cause problems with code tracing where it is assumed that lines are increased monotonically and *instr_lb <= frame->f_lasti < *instr_ub. We should carefully analyze the effect of the patch on the tracing.

Before committing you must ask Guido for approval. AFAIK his had objections against code transformations that make debugging harder.
msg258522 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 13:31
We have many unit test in the Python test suite which rely on exact line numbers. Examples:

* test_sys_settrace
* test_pdb
* test_dis

I know them because they were all broken when my fatoptimizer project had bugs related to line numbers :-)

With my patch, the full Python test suite pass whereas my patch doesn't modify any test.


> I afraid this patch can cause problems with code tracing where it is assumed that lines are increased monotonically and *instr_lb <= frame->f_lasti < *instr_ub. We should carefully analyze the effect of the patch on the tracing.

First, my patch has no impact on frame->f_lasti.

The trace module and test_sys_settrace use frame.f_lineno which PyFrame_GetLineNumber(). This function returns f->f_lineno if the frame has a trace function, or PyCode_Addr2Line(). PyCode_Addr2Line() and PyFrame_GetLineNumber() still work with my patch.

When you trace a program, "negative line delta" and "negative instruction offset" are not new in Python: it's a basic requirement to support loop, when you compare two instructions seen by the tracer.

To be clear, my patch does *not* introduce negative line number delta in co_lnotab. It only *adds support* for negative line number delta. If a tool decodes co_lnotab using 8-bit unsigned number for line number delta, the tool still works even with the patch. It only starts to return wrong line numbers if you start debugging a program which has negative line numbers.

If you use fatoptimizer, you get such negative delta. But if you use an optimizer, you should be prepared to some subtle differences. The good practice is to disable all optimizers when you debug code. It's also really hard (or impossible) to debug C code optimized with -O3. I always use gcc -O0 to debug CPython.


> Before committing you must ask Guido for approval. AFAIK his had objections against code transformations that make debugging harder.

Are you aware of tools decoding directly co_lnotab?

Oh, I forgot the old Misc/gdbinit script which *does* decode directly co_lnotab. Does anyone still use it? If yes, it should also be updated.

I failed to reproduce the bug with Misc/gdbinit, beacuse bug only occurs if you debug a program which uses negative line number, and CPython doesn't produce negative line number in co_lnotab by default... So it would be "nice" to also support negative line number in Misc/gdbinit, but maybe it's ok to let this old script dying? :-D
msg258523 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 13:38
> Are you aware of tools decoding directly co_lnotab?

Ah! I found Ned Batchelder's coverage project which has a _bytes_lines() method "adapted from dis.py in the standard library". The method uses directly co_lnotab to compute line numbers.

Ok, *this project* will have to be updated if it wants to support fatoptimizer and other code transformers producing negative line numbers.

Maybe I can contribute to it with a patch if my change to CPython 3.6 is accepted ;-)
msg258527 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-18 17:01
I just wanted to comment on "don't change importlib MAGIC: we only change it between Python minor versions": that's actually not true. Feel free to up the number whenever you make a change that affects eval.c or bytecode. Otherwise .pyc files won't be regenerated. And that number is cheap anyway and isn't about to run out, so don't worry about updating it multiple times before the code sees a public release.
msg258529 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 17:08
Brett Cannon added the comment:
> I just wanted to comment on "don't change importlib MAGIC: we only change it between Python minor versions": that's actually not true. Feel free to up the number whenever you make a change that affects eval.c or bytecode. Otherwise .pyc files won't be regenerated. And that number is cheap anyway and isn't about to run out, so don't worry about updating it multiple times before the code sees a public release.

Since my patch may break setup of multiple python developers, it can
be worth to increase this number, ok :-)
msg258547 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-18 21:09
But there is no need to increase it by 10. I suppose the gap is added to allow updating bytecode in maintained releases, but in process of developing next version we don't need this.
msg258549 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-18 21:30
There's technically no need to worry about ranged values as the magic number is purely an equality check to see if the interpreter matches what the .pyc was created with. I guess there might be third-party code that does a range check, but that's bad as importlib checks the raw bytes only; using a number is mostly a convenience for changing it.
msg258552 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-18 21:49
The launcher on Windows does a range check.
msg258556 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-18 22:31
Patch version 4:

* finish to update Objects/lnotab_notes.txt
* update _PyCode_CheckLineNumber() in codeobject.c
* set importlib MAGIC to 3361

I don't expect my patch to be complete nor perfect. IMHO it's fine to adjust the code later if needed.

I would like to integrate FAT Python changes step by step. It looks like the general idea of AST optimizers is well accepted.
msg258628 - (view) Author: Antti Haapala (ztane) * Date: 2016-01-19 22:26
Nice work, my issue21385 is also related. Basically, transforming non-Python code into Python meant that all line number information, which otherwise would have been useful for debugging, had to be discarded, or debug builds of Python would dump cores.

So, bye "assert(d_lineno >= 0);", you won't be missed.
msg258668 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-20 11:31
New changeset 775b74e0e103 by Victor Stinner in branch 'default':
co_lnotab supports negative line number delta
https://hg.python.org/cpython/rev/775b74e0e103
msg258669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-20 11:34
> Nice work, my issue21385 is also related. Basically, transforming non-Python code into Python meant that all line number information, which otherwise would have been useful for debugging, had to be discarded, or debug builds of Python would dump cores.

Ok, it looks like there are multiple use cases for negative line numbers, and the change doesn't really break anything in practice. I tried to explain exactly who is impacted and how to update the code in the Porting section of What's New in Python 3.6.

For each review Serhiy. I pushed the the change to Python 3.6.
msg258672 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-20 11:37
I closed issues #16956 and #21385 as duplicates of this issue.
msg258760 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-21 17:12
New changeset c6fb1651ea2e by Victor Stinner in branch 'default':
Issue #26107: Fix typo in Objects/lnotab_notes.txt
https://hg.python.org/cpython/rev/c6fb1651ea2e
msg259015 - (view) Author: Roundup Robot (python-dev) Date: 2016-01-27 11:20
New changeset 16f60cd918e0 by Victor Stinner in branch 'default':
PEP 511
https://hg.python.org/peps/rev/16f60cd918e0
History
Date User Action Args
2016-01-27 11:20:21python-devsetmessages: + msg259015
2016-01-21 17:12:46python-devsetmessages: + msg258760
2016-01-20 11:37:02vstinnersetmessages: + msg258672
2016-01-20 11:36:19vstinnerlinkissue16956 superseder
2016-01-20 11:35:39vstinnerlinkissue21385 superseder
2016-01-20 11:34:01vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg258669
2016-01-20 11:31:37python-devsetmessages: + msg258668
2016-01-19 22:26:54ztanesetnosy: + ztane
messages: + msg258628
2016-01-18 22:31:05vstinnersetfiles: + lnotab-4.patch

messages: + msg258556
2016-01-18 21:49:57serhiy.storchakasetmessages: + msg258552
2016-01-18 21:30:37brett.cannonsetmessages: + msg258549
2016-01-18 21:09:58serhiy.storchakasetmessages: + msg258547
2016-01-18 17:08:23vstinnersetmessages: + msg258529
2016-01-18 17:01:20brett.cannonsetmessages: + msg258527
2016-01-18 13:38:40vstinnersetmessages: + msg258523
2016-01-18 13:31:27vstinnersetmessages: + msg258522
2016-01-18 11:57:53serhiy.storchakasetmessages: + msg258520
2016-01-18 11:06:32vstinnersetmessages: + msg258515
2016-01-18 11:00:29vstinnersetfiles: + lnotab-3.patch

messages: + msg258514
2016-01-18 10:53:47vstinnersetfiles: + lnotab-2.patch

messages: + msg258513
2016-01-18 09:16:38vstinnersettitle: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer -> PEP 511: code.co_lnotab: use signed line number delta to support moving instructions in an optimizer
2016-01-14 15:20:30vstinnersetnosy: + Mark.Shannon
2016-01-14 12:32:42serhiy.storchakasetnosy: + gvanrossum
messages: + msg258197
2016-01-14 09:20:10vstinnersetmessages: + msg258191
2016-01-14 09:16:09vstinnersettype: enhancement
components: + Interpreter Core
2016-01-14 09:15:43python-devsetnosy: + python-dev
messages: + msg258190
2016-01-14 09:12:43vstinnercreate