This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: optimize eval_frame
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: nnorwitz, rhettinger
Priority: normal Keywords: patch

Created on 2003-12-21 18:30 by nnorwitz, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
optimize2.patch nnorwitz, 2003-12-21 18:30
Messages (6)
msg45061 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2003-12-21 18:30
There are several different parts to this patch which
are separable.  They each seemed to have a small
benefit.  It would be interesting for others to test
this patch in whole and in different parts to see if
speed can be improved.  I generally got between 1% -
10% improvement.  I used pystone, pybench, and the
total time to run all regression tests.  Runs were on a
RH9 Linux/Athlon 650.  I used a non-debug build (so gcc
3.2 with -O3).  All regression tests pass with these
changes.

I removed registers from many variables.  This seemed
to have little to no effect.  So I'm not sure about
those.  opcode does not need to be initialized to 0.  I
removed the freevars variable since it is rarely used.

I think the largest benefit was from adding the gotos
for opcodes which set why:  BREAK_LOOP, CONTINUE_LOOP,
RETURN_VALUE, YIELD_VALUE;  This skips many tests which
are known a priori depending on the opcode.

I removed the special check for list in UNPACK_SEQUENCE
since this path is rarely used. 
(http://coverage.livinglogic.de/file.phtml?file%5fid=12442339)
 I also removed the predcitions for JUMP_IF_TRUE since
this wasn't executed often (see previous URL).

I added 2 opcodes for calling functions with 0 or 1
arguments.  This removed a lot of code in
call_function().  By removing test branches in  several
places, this seemed to speed up the code.  However, it
seemed that just specializing for 0 arguments was
better than for 1 arg.  I'm not sure if the
specialization for 1 argument provides much benefit. 
Both of these specializations could possibly be
improved to speed things up.
msg45062 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-12-24 08:20
Logged In: YES 
user_id=80475

I'll try these out and review the patch when I get back from 
vacation next week.  

The list special case for UNPACK_SEQUENCE and the 
prediction for JUMP_IF_TRUE should be left in -- they do 
provide speed-ups for code that exercises those features and 
they don't hurt the general cases.
msg45063 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-01-01 03:42
Logged In: YES 
user_id=80475

The patch is promising.  I'm able to measure a small speed-
up for the two new function opcodes and for the setwhy 
gotos.  Both optimizations make sense.

I don't measure a savings from not initializing opcode and 
oparg.  That change makes sense conceptually because the 
variables are always assigned before use; however, the 
surrounding control flow statements hide that fact from the 
compiler.  So, it is likely that they were initialized to 
suppress warnings on somebody's system.  If so, then that 
change should not be made.

The other stuff should definitely be left out.  The effect of 
register variables will vary from compiler to compiler, so if 
you can't measure an improvement, it is best to leave it 
alone.  Some compilers do not do much in the way of 
optimization and the register declaration may be a valuable 
hint.

Please leave in the branch prediction for JUMP_IF_TRUE -- I 
put it in after finding measurable savings in real code.  While 
it doesn't come up often, when it does it should run as fast 
as possible.

The special case for UNPACK_SEQUENCE is up for grabs.  
When that case occurs, the speedup is substantial.  Also, 
given that the tuple check has failed, it becomes highly 
probable that the target is a list.  OTOH, this inlined code 
fattens the already voluminuous code for eval_frame.  Maybe 
eliminating it will help someone's optimizer cope with all the 
code.  Use your judgement on this one.

Removing the freevars variable did not show any speedup. It
does keep one variable off the stack and shortens the startup 
time by a few instructions.  OTOH, the in-lined replacements 
for it result in a net expansion of code size and causes a 
microscopic slowdown whenever it is used.  I recommend 
leaving this one alone.

Executive summary:  Only make the two big changes that 
show meaurable speedups and make conceptual sense.  
Leave the other stuff alone.

One other thought, try making custom benchmarks for 
targeted optimizations.  The broad spectrum benchmarks are 
too coarse to tell whether an improvement is really working.

Also, be sure to check with Guido before adding the new 
opcodes.

Ideally, each optimization should be loaded separately so its 
effects can be isolated and to allow any one to be backed out 
if necessary.
msg45064 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-02-06 18:37
Logged In: YES 
user_id=80475

Added a simplified version of the goto optimization.
See Python/ceval.c 2.374
msg45065 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-07 09:13
Logged In: YES 
user_id=80475

Neal, assigning back to you in case you want to purse the
two new  
opcodes.
msg45066 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2004-10-21 03:25
Logged In: YES 
user_id=33168

No reason to take this further.
History
Date User Action Args
2022-04-11 14:56:01adminsetgithub: 39722
2003-12-21 18:30:29nnorwitzcreate