This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Helping the compiler avoid memory references in PyEval_EvalFrameEx
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: collinwinter, gregory.p.smith, jyasskin, nnorwitz, pitrou
Priority: normal Keywords: patch

Created on 2008-03-09 17:38 by jyasskin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
elim_mem_refs.patch jyasskin, 2008-03-09 17:38 review
elim_mem_refs.patch jyasskin, 2008-03-10 15:49 with x and err back in their original places review
Messages (10)
msg63426 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-09 17:38
I'm using Apple's gcc-4.0.1 on a 2.33GHz Intel Core 2 Duo to test this.
Measurements from other compilers or other chips would be cool.

Specifically, this patch:
 1) Moves the declarations of most of the local variables inside the
for(;;) loop. That shortens their lifetimes so that the compiler can
skip spilling them to memory in some cases. Pushing them further down
into the individual case statements didn't change the generated assembly
in most cases, although there are probably a few opcodes where it would.
 2) Eliminates the initialization of w, and fixes the "possibly used
uninitialized" warning by changing how the PRINT opcodes initialize it
from stream. That lets my compiler avoid using its stack entry most of
the time.
 3) In two hot opcodes, LOAD_FAST and LOAD_CONST, changes the 'x' to a
'w'. 'x' is always written to memory because it's used for error
detection, while 'w' can stay on the stack.
 4) Changes --_Py_Ticker in the periodic things check to _Py_Ticker--.
Because _Py_Ticker is volatile, predecrement (at least on my compiler)
needs 3 memory accesses, while postdecrement gets away with 2.

Together, these changes are worth about 3% on pybench on my machine.
msg63427 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-03-09 18:45
I bet with just a little more work, you could get rid of t and stream. 
t is only used for a single set of opcodes (STORE_SLICE+n).  stream is
only used for the PRINT opcodes.  The code in print could be moved to a
function which might allow the compiler to do a better job.  I'll
benchmark this later on amd64 and amd x86 linux boxes.  Maybe mac ppc g4
if I'm adventurous. :-)
msg63428 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-03-09 18:47
Can't next_instr and stack_pointer move inside the for loop too?
msg63431 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-03-10 01:23
I've yet to run pybench, but I came get these warnings from
the compiler after applying the patch:

  ../Python/ceval.c: In function 'PyEval_EvalFrameEx':
  ../Python/ceval.c:772: warning: 'x' may be used uninitialized in this
function
  ../Python/ceval.c:770: warning: 'err' may be used uninitialized in
this function

Platform is Mac OS X 10.5.2 MacBook Pro, gcc version 4.0.1 (Apple Inc.
build 5465)
msg63432 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-03-10 01:43
Pybench doesn't show much difference for me, about 0.1% better on minimum
times.  A few tests are quite a bit worse (> 10%) with the patch
(Recursion, SimpleFloatArithmetic, StringPredicates, TryFinally).  A few
are quite a bit better (CompareFloatsIntegers, CompareIntegers,
DictWithFloatKeys).
msg63434 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-10 05:51
Neal, t and stream aren't likely to have much effect since they're used
so little. next_instr and stack_pointer are used to communicate between
loops, so they can't move inside. I eagerly await your benchmark runs. :)

Skip, I managed to reproduce the warnings with gcc-4.2.1, and while
they're wrong, I see why they're happening. The if (throwflag) block
skips to on_error, which misses the initializations of x and err. The
right way to fix it is either to eliminate the error-reporting behavior
of x and err or to extract on_error into a function, but for this patch
I would probably just keep x and err out of the loop.

Your numbers made me look closer at my results for individual tests, and
I'm now confused about how reliable pybench is. My gcc-4.0 was build
#5367 on OS X 10.4.11, and MacPorts' gcc-4.2.1 (with a necessary
configure.in tweak) still shows a 1-2% gain overall. But contrary to
your numbers, it gives me a 10% speedup in Recursion and a 1% slowdown
in CompareFloatsIntegers. My big losers are SimpleListManipulation with
an 18% loss on 4.2 and CompareInternedStrings with a 20% loss on 4.0,
but those are both small winners (~5%) on the opposite compiler! I
wouldn't be surprised if the overall numbers were different between even
slightly different machines and compilers, but I'm really surprised to
see the same tests affected in opposite directions. Is that common with
pybench and compiler changes?
msg63439 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-03-10 12:44
Jeffrey> ... but I'm really surprised to see the same tests affected in
    Jeffrey> opposite directions. Is that common with pybench and compiler
    Jeffrey> changes?

I've no idea.  Marc-André Lemburg would be the person with the most insight
into pybench behavior I think.

Skip
msg63860 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-03-18 03:03
Another data point: I just installed gcc-4.3.0, and the second patch
gives it a 6% speedup. On the downside, 4.3 is still about 9% slower
than 4.0. :-( Neal, do you have your measurements?
msg63903 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-03-18 07:10
We really could use an automated pybench runner on a dedicated machine
driven by a buildbot feeding its results into a database so that we had
a record of exactly when/what caused performance changes over time.

This sounds remarkably like some of the work I was doing at my last
job... ;)

I'll run some benchmarks using this patch of my own and on the grander
scale ponder what I need to make a dedicated automated Python benchmark
server happen.
msg188401 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-04 19:58
This avenue doesn't look promising at all. Closing.
History
Date User Action Args
2022-04-11 14:56:31adminsetgithub: 46515
2013-05-04 19:58:19pitrousetstatus: open -> closed
resolution: works for me
messages: + msg188401

stage: resolved
2010-05-20 20:34:45skip.montanarosetkeywords: patch, patch
nosy: - skip.montanaro
2009-01-08 01:16:01collinwintersetkeywords: patch, patch
nosy: + collinwinter
2008-04-28 12:49:45pitrousetnosy: + pitrou
2008-03-18 07:10:51gregory.p.smithsetkeywords: patch, patch
nosy: + gregory.p.smith
messages: + msg63903
2008-03-18 03:03:40jyasskinsetkeywords: patch, patch
type: behavior -> performance
2008-03-18 03:03:30jyasskinsetkeywords: patch, patch
messages: + msg63860
2008-03-10 15:49:47jyasskinsetfiles: - elim_mem_refs.patch
2008-03-10 15:49:32jyasskinsetkeywords: patch, patch
files: + elim_mem_refs.patch
2008-03-10 15:46:15jyasskinsetkeywords: patch, patch
files: + elim_mem_refs.patch
2008-03-10 12:44:54skip.montanarosetmessages: + msg63439
2008-03-10 05:51:45jyasskinsetkeywords: patch, patch
messages: + msg63434
2008-03-10 01:43:50skip.montanarosetkeywords: patch, patch
messages: + msg63432
2008-03-10 01:23:30skip.montanarosetkeywords: patch, patch
nosy: + skip.montanaro
messages: + msg63431
2008-03-09 18:47:40nnorwitzsetkeywords: patch, patch
messages: + msg63428
2008-03-09 18:45:43nnorwitzsetkeywords: patch, patch
nosy: + nnorwitz
messages: + msg63427
2008-03-09 17:38:51jyasskincreate