Author vstinner
Recipients brett.cannon, rhettinger, serhiy.storchaka, vstinner
Date 2016-01-14.09:12:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1452762763.46.0.511729382261.issue26107@psf.upfronthosting.co.za>
In-reply-to
Content
Python doesn't store the original line number in the .pyc file in the bytecode. Instead, an efficient table is used to find the line number from the current in the bytecode: code.co_lnotab.

Basically, it's a list of (offset_delta, line_number_delta) pairs where offset_delta and line_number_delta are unsigned 8 bits numbers. If an offset delta is larger than 255, (offset_delta % 255, line_number_delta) and (offset_delta // 255, 0) pairs are emited. Same for line_number_delta. (In fact, more than two pairs can be created.)

The format is described in Objects/lnotab_notes.txt.

I implemented an optimizer which can generate *negative* line number. For example, the loop:

   for i in range(2):   # line 1
      print(i)          # line 2

is replaced with:

   i = 0      # line 1
   print(i)   # line 2
   i = 1      # line 1
   print(i)   # line 2

The third instruction has a negative line number delta.

I'm not the first one hitting the issue, but it's just that no one proposed a patch before. Previous projects bitten by this issue:

* issue #10399: "AST Optimization: inlining of function calls"
* issue #11549: "Build-out an AST optimizer, moving some functionality out of the peephole optimizer"

Attached patch changes the type of line number delta from unsigned 8-bit integer to *signed* 8-bit integer. If a line number delta is smaller than -128 or larger than 127, multiple pairs are created (as before).

My code in Lib/dis.py is inefficient. Maybe unpack the full lnotab than *then* skip half of the bytes? (instead of calling struct.unpack times for each byte).

The patch adds also "assert(Py_REFCNT(lnotab_obj) == 1);" to PyCode_Optimize(). The assertion never fails, but it's just to be extra safe.

The patch renames variables in PyCode_Optimize() because I was confused between "offset" and "line numbers". IMHO variables were badly named.

I changed the MAGIC_NUMBER of importlib, but it was already changed for f-string:

#     Python 3.6a0  3360 (add FORMAT_VALUE opcode #25483)

Is it worth to modify it again?

You may have to recompile Python/importlib_external.h if it's not recompiled automatically (just touch the file before running make).

Note: this issue is related to the PEP 511 (the PEP is not ready for a review, but it gives a better overview of the use cases.)
History
Date User Action Args
2016-01-14 09:12:46vstinnersetrecipients: + vstinner, brett.cannon, rhettinger, serhiy.storchaka
2016-01-14 09:12:43vstinnersetmessageid: <1452762763.46.0.511729382261.issue26107@psf.upfronthosting.co.za>
2016-01-14 09:12:43vstinnerlinkissue26107 messages
2016-01-14 09:12:42vstinnercreate