This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Second pass of PyCode_Optimize
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, christian.heimes, eric.snow, llllllllll, rhettinger, yselivanov
Priority: normal Keywords: patch

Created on 2015-04-20 09:09 by llllllllll, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
secondpass.patch llllllllll, 2015-04-20 09:09 review
Messages (9)
msg241626 - (view) Author: Joe Jevnik (llllllllll) * Date: 2015-04-20 09:09
There are a lot of optimizations that are being missed by only running a single pass of PyCode_Optimize. I originally started by trying to optimize for De Morgan's Laws in if tests; however, I realized that the issue actually went away if you run the optimizer twice. I imagine that this would provide a lot of other performance increases because some of the patterns don't show up until the optimizer has already run once. For example:

>>> def f(a, b):
...     if not a and not b:
...         return True
...     return False
... 

On default, this generates:
>>> dis(f)
  2           0 LOAD_FAST                0 (a)
              3 UNARY_NOT
              4 POP_JUMP_IF_FALSE       18
              7 LOAD_FAST                1 (b)
             10 UNARY_NOT
             11 POP_JUMP_IF_FALSE       18

  3          14 LOAD_CONST               1 (True)
             17 RETURN_VALUE

  4     >>   18 LOAD_CONST               2 (False)
             21 RETURN_VALUE


If we run the optimizer again, we can collapse some of the UNARY_NOT -> POP_JUMP_IF_FALSE back into POP_JUMP_IF_TRUE, like this:
>>> dis(f)
  2           0 LOAD_FAST                0 (a)
              3 POP_JUMP_IF_TRUE        16
              6 LOAD_FAST                1 (b)
              9 POP_JUMP_IF_TRUE        16

  3          12 LOAD_CONST               1 (True)
             15 RETURN_VALUE

  4     >>   16 LOAD_CONST               2 (False)
             19 RETURN_VALUE


Also, Python/importlib.h blew up in the diff; should this be included in the patch?
msg241641 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2015-04-20 13:19
> Also, Python/importlib.h blew up in the diff; should this be included in
the patch?

Yes.  It is the frozen bytecode for bootstrapping importlib as the import
system.  So changes to bytecode generation like this will propagate there.
msg241647 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-04-20 13:45
Is there any noticeable performance increase with the patch?

Please attach results from https://hg.python.org/benchmarks
msg241652 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-04-20 14:29
I would like to keep the time spent in the optimizer itself to a minimum.  Also, it should keep focused on patterns that actually occur in code as opposed to contrived bits of code.   Tim and Guido let us put it the optimizer only on the condition that it be kept simple and with low overhead.   The work that has been needed for a long time was to move a number of the optimizations (such as constant folding) out of the peepholer optimizer and replace them with AST manipulations just upstream from code generation (to reduce the need for the peepholer to partially disassemble and re-interpret the bytecode).
msg241654 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2015-04-20 14:49
How about restricting double pass optimization to code objects that have the CO_OPTIMIZED bit set? It doesn't affect normal code execution.
msg241659 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-04-20 15:21
Just FYI, http://bugs.python.org/issue11549 can act as a tracking issue for an AST optimization path that runs prior to the peepholer.
msg241660 - (view) Author: Joe Jevnik (llllllllll) * Date: 2015-04-20 15:28
I am actually getting inconsistent results from running benchmark; I am pretty surprised by this. I could look into making the second pass only happen if the code has the CO_OPTIMIZED bit set; however, based on the results I am seeing from the benchmark runs, I am not sure it is worth it. Does the benchmark tool recompile the code every run?
msg241662 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-04-20 15:31
> Does the benchmark tool recompile the code every run?

Make sure you've bumped magic number in importlib/bootstrap; then make clean; make
msg241669 - (view) Author: Joe Jevnik (llllllllll) * Date: 2015-04-20 16:44
I tried bumping the magic number; however, I still cannot get consistent results when running benchmark locally. I guess I will close this as I cannot consistently show an improvement.
History
Date User Action Args
2022-04-11 14:58:15adminsetgithub: 68202
2015-04-20 16:44:29llllllllllsetstatus: open -> closed

messages: + msg241669
2015-04-20 15:31:04yselivanovsetmessages: + msg241662
2015-04-20 15:28:57llllllllllsetmessages: + msg241660
2015-04-20 15:21:39brett.cannonsetnosy: + brett.cannon
messages: + msg241659
2015-04-20 14:49:28christian.heimessetnosy: + christian.heimes
messages: + msg241654
2015-04-20 14:29:00rhettingersetnosy: + rhettinger
messages: + msg241652
2015-04-20 13:45:05yselivanovsetnosy: + yselivanov
messages: + msg241647
2015-04-20 13:19:22eric.snowsetnosy: + eric.snow
messages: + msg241641
2015-04-20 09:09:20llllllllllcreate