classification
Title: Faster why variable manipulation in ceval.c
Type: performance Stage: resolved
Components: Versions: Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, ajaksu2, blaisorblade, collinwinter, jyasskin, pitrou, rhettinger, serhiy.storchaka, vstinner
Priority: low Keywords:

Created on 2009-01-09 14:31 by skip.montanaro, last changed 2018-06-03 12:37 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
unnamed skip.montanaro, 2009-01-09 14:31
gcc_4.2.4_linux_ia32_bench.txt ajaksu2, 2009-01-09 17:53 pybench results
unpatched.txt BreamoreBoy, 2015-03-15 17:09 pybench test output
Messages (16)
msg79470 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-01-09 14:31
The why_code enum in ceval.c has values which form a bit set.  Comparison of
the why variable against multiple values is going to be faster using bitwise
operations instead of logical ones.  For example, instead of

    why == WHY_RETURN || why == WHY_CONTINUE

the equivalent bitwise expression is

    why & (WHY_RETURN | WHY_CONTINUE)

which has fewer operations (one vs three treating the rhs of & as a
constant).  This is already done in one place.  The attached patch converts
all other expressions of the first form.

Also, there are some further manipulations of why in the loop after the
fast_block_end.  The loop can only be entered if why != WHY_NOT.  In the
loop when it is set to WHY_NOT, the loop breaks.  There is thus no reason to
test its value in the while expression.  Further, instead of just breaking
from the loop and then checking the why != WHY_NOT again, just jump past
that check by adding a why_not_here label.

The attached patch implements these changes (against the py3k branch).  All
tests pass on my Mac except test_cmd_line (which has been failing for
awhile).

Skip
msg79482 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-01-09 17:53
Neat, gives a 10% speedup on a Celeron M with gcc 4.2.
msg79483 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-09 18:31
I get a 3% speedup on x86-64 with gcc 4.3.2.
The label "why_not_here" should be renamed to something more meaningful
IMO. Or you could just kill the label and use "continue" instead.
msg79484 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-01-09 18:40
Antoine> The label "why_not_here" should be renamed to something more
    Antoine> meaningful IMO. Or you could just kill the label and use
    Antoine> "continue" instead.

I thought "why_not_here" was meaningful.  "Here" is where you go when why ==
WHY_NOT.  I don't think continue will work.  The goto is coming out of an
inner loop.  If you continue from there you just continue the inner loop.  I
replaced a break with a goto.

Skip
msg79486 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-09 18:55
> I thought "why_not_here" was meaningful.

I don't know, when I see "goto why_not_here" it looks like a joke to
me :)

> I don't think continue will work.  The goto is coming out of an
> inner loop.  If you continue from there you just continue the inner loop.

Oops, sorry for misguided advice.
msg79501 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-01-09 20:24
>> I thought "why_not_here" was meaningful.

    Antoine> I don't know, when I see "goto why_not_here" it looks like a
    Antoine> joke to me :)

Well, I think the enum name WHY_NOT is kind of a joke itself, but it's been
that way for so long I see no reason to change it.  I'll add a comment to
the label which describes the intent in plain(er) English.

Skip
msg79598 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-01-11 16:00
Pystone results:

apply why patch

py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.08154
This machine benchmarks at 46230.2 pystones/second
py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.08365
This machine benchmarks at 46140.4 pystones/second
py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.08598
This machine benchmarks at 46041.4 pystones/second

remove patch

py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.10093
This machine benchmarks at 45416.3 pystones/second
py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.10458
This machine benchmarks at 45266.1 pystones/second
py3k% rm $TMPDIR/*.[coi] ; make python.exe && rm -f /tmp/trash ; 
./python.exe Lib/test/pystone.py
rm: /tmp/*.[coi]: No such file or directory
make: `python.exe' is up to date.
Pystone(1.1) time for 50000 passes = 1.10333
This machine benchmarks at 45317.5 pystones/second
msg79599 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-01-11 16:05
pybench comparison...

% ./python.exe Tools/pybench/pybench.py -s stock.out -c why.out -----------------
--------------------------------------------------------------
PYBENCH 2.0
-------------------------------------------------------------------------------
* using CPython 3.1a0 (py3k:68444M, Jan 11 2009, 10:02:04) [GCC 4.0.1 (Apple Inc. 
build 5490)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

-------------------------------------------------------------------------------
Benchmark: stock.out
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    Darwin-9.6.0-i386-32bit
       Processor:      i386
    
    Python:
       Implementation: CPython
       Executable:     /Users/skip/src/python/py3k/python.exe
       Version:        3.1.0
       Compiler:       GCC 4.0.1 (Apple Inc. build 5490)
       Bits:           32bit
       Build:          Jan 11 2009 09:57:51 (#py3k:68444)
       Unicode:        UCS2


-------------------------------------------------------------------------------
Comparing with: why.out
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    Darwin-9.6.0-i386-32bit
       Processor:      i386
    
    Python:
       Implementation: CPython
       Executable:     /Users/skip/src/python/py3k/python.exe
       Version:        3.1.0
       Compiler:       GCC 4.0.1 (Apple Inc. build 5490)
       Bits:           32bit
       Build:          Jan 11 2009 10:02:04 (#py3k:68444M)
       Unicode:        UCS2


Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:   110ms   106ms   +4.1%   111ms   107ms   +3.7%
           BuiltinMethodLookup:    78ms    79ms   -0.7%    80ms    80ms   -0.4%
                 CompareFloats:   105ms   113ms   -7.3%   106ms   114ms   -6.8%
         CompareFloatsIntegers:   161ms   146ms  +10.8%   163ms   151ms   +7.7%
               CompareIntegers:   145ms   151ms   -3.8%   147ms   152ms   -3.1%
        CompareInternedStrings:   109ms   134ms  -18.9%   110ms   135ms  -18.9%
                  CompareLongs:    86ms    90ms   -4.5%    86ms    91ms   -4.8%
                CompareStrings:    89ms    88ms   +0.4%    91ms    89ms   +2.3%
    ComplexPythonFunctionCalls:   146ms   141ms   +3.1%   151ms   143ms   +5.9%
                 ConcatStrings:   124ms   123ms   +0.8%   126ms   129ms   -2.0%
               CreateInstances:   146ms   144ms   +1.9%   148ms   146ms   +1.2%
            CreateNewInstances:   110ms   109ms   +1.2%   111ms   110ms   +0.6%
       CreateStringsWithConcat:   171ms   158ms   +8.4%   175ms   165ms   +5.8%
                  DictCreation:    85ms    80ms   +6.5%    86ms    81ms   +6.8%
             DictWithFloatKeys:    98ms    99ms   -1.0%    99ms   100ms   -1.0%
           DictWithIntegerKeys:    94ms    97ms   -3.4%    94ms   103ms   -8.6%
            DictWithStringKeys:    84ms    88ms   -5.0%    84ms    92ms   -8.3%
                      ForLoops:    85ms    82ms   +3.0%    85ms    83ms   +3.1%
                    IfThenElse:   116ms   132ms  -12.1%   117ms   133ms  -12.5%
                   ListSlicing:   119ms   118ms   +0.1%   124ms   124ms   -0.1%
                NestedForLoops:   141ms   129ms   +9.1%   142ms   132ms   +7.8%
      NestedListComprehensions:   164ms   159ms   +3.4%   166ms   160ms   +3.8%
          NormalClassAttribute:   215ms   218ms   -1.2%   217ms   220ms   -1.2%
       NormalInstanceAttribute:   111ms   115ms   -3.0%   113ms   117ms   -3.7%
           PythonFunctionCalls:   104ms   123ms  -15.5%   104ms   124ms  -15.6%
             PythonMethodCalls:   144ms   144ms   -0.3%   148ms   149ms   -0.6%
                     Recursion:   180ms   195ms   -7.8%   191ms   198ms   -3.7%
                  SecondImport:   106ms   107ms   -1.1%   107ms   108ms   -1.1%
           SecondPackageImport:   118ms   117ms   +1.3%   119ms   118ms   +1.1%
         SecondSubmoduleImport:   154ms   156ms   -1.2%   156ms   157ms   -0.6%
       SimpleComplexArithmetic:    87ms    89ms   -1.9%    88ms    95ms   -7.4%
        SimpleDictManipulation:   180ms   176ms   +2.5%   183ms   178ms   +2.6%
         SimpleFloatArithmetic:    92ms    99ms   -7.3%    93ms   102ms   -9.0%
      SimpleIntFloatArithmetic:   127ms   127ms   -0.4%   128ms   129ms   -0.8%
       SimpleIntegerArithmetic:   127ms   136ms   -6.7%   128ms   138ms   -7.6%
      SimpleListComprehensions:   128ms   129ms   -0.4%   131ms   130ms   +0.2%
        SimpleListManipulation:    95ms   108ms  -12.5%    97ms   110ms  -11.6%
          SimpleLongArithmetic:    85ms    86ms   -2.0%    85ms    89ms   -4.4%
                    SmallLists:   154ms   153ms   +0.8%   157ms   156ms   +0.8%
                   SmallTuples:   159ms   157ms   +1.3%   161ms   159ms   +1.1%
         SpecialClassAttribute:   357ms   341ms   +4.8%   360ms   345ms   +4.3%
      SpecialInstanceAttribute:   110ms   113ms   -2.5%   111ms   115ms   -3.4%
                StringMappings:   293ms   312ms   -6.3%   298ms   315ms   -5.3%
              StringPredicates:   122ms   126ms   -2.6%   125ms   127ms   -1.9%
                 StringSlicing:   216ms   215ms   +0.5%   222ms   221ms   +0.8%
                     TryExcept:    76ms    71ms   +7.0%    77ms    71ms   +7.5%
                    TryFinally:    84ms    86ms   -1.2%    85ms    86ms   -1.4%
                TryRaiseExcept:    58ms    60ms   -3.4%    59ms    61ms   -3.6%
                  TupleSlicing:   174ms   158ms  +10.3%   181ms   162ms  +11.4%
                   WithFinally:   133ms   142ms   -6.2%   138ms   143ms   -3.8%
               WithRaiseExcept:   153ms   149ms   +2.6%   154ms   151ms   +2.0%
-------------------------------------------------------------------------------
Totals:                          6708ms  6772ms   -0.9%  6820ms  6897ms   -1.1%

(this=stock.out, other=why.out)
msg79989 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2009-01-16 23:47
Another data point: I've tested this patch applied to trunk on Core 2
Duo and Opteron 8214 HE machines using both gcc 4.0.3 and 4.3.1, and I'm
seeing mixed results. Pybench with warp 1 is between ~1.5% slower and
~1% faster, depending on gcc version (fairly consistent across
machines). 2to3 and two template systems I've tested are between 2.5%
slower and 2% faster depending on workload and gcc version.
msg79990 - (view) Author: Paolo 'Blaisorblade' Giarrusso (blaisorblade) Date: 2009-01-17 00:16
Given a 10% speedup on some systems, and statistically insignificant
changes on other systems, I would still apply the patch, even simply
because the bitmask part simply makes more sense.

I'm not sure about the goto part, but still, it does straighten the
code. Anyway, simply call the label "why_is_WHY_NOT",
"why_set_to_WHY_NOT" or something like that. Verbosity on a use-once
label used with goto should be encouraged - we're not Java programmer,
but we need to pay for using goto by increasing readability in other ways.

@collinwinter: since the differences you report are so low (similar to
the statistical noise I get on my machine), I would expect that you're
just getting statistical noise instead of different results depending on
the GCC version, unless you performed statistical hypothesis testing
(confidence intervals and related stuff). And if I had done the needed
tens/hundreds of repetitions for hypothesis testing, I'd state that
clearly, so I suppose you didn't, and that's fully acceptable since the
result is likely to be statistically insignificant anyway.
msg220684 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-15 22:12
I'm guessing that a patch to ceval.c that's this old wouldn't apply cleanly now.  I'll rework it but only if the changes are highly likely to be accepted.  Given the mixed results previously reported this is not guaranteed.  Opinions please.
msg220690 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-06-16 02:33
Before evaluating this further, the timings should be updated for the current 3.5 code and using the various compilers for the difference OSes.  Also, it would be nice to run Antoine's suite of benchmarks.
msg237299 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-05 22:44
Where do we find "Antoine's suite of benchmarks"?
msg237303 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-05 23:05
> Where do we find "Antoine's suite of benchmarks"?

https://hg.python.org/benchmarks
msg238154 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-15 17:09
I've finally remembered to attach the test output I got a week ago.  If you want me to run Antoine's test suite with any specific parameters please feel free to ask.
msg318540 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-03 12:37
This issue is outdated since moving unwinding of stack for "pseudo exceptions" from interpreter to compiler in issue17611.
History
Date User Action Args
2018-06-03 12:37:35serhiy.storchakasetstatus: open -> closed
resolution: out of date
messages: + msg318540

stage: resolved
2015-03-15 17:09:45BreamoreBoysetfiles: + unpatched.txt

messages: + msg238154
2015-03-05 23:05:15vstinnersetmessages: + msg237303
2015-03-05 22:44:23BreamoreBoysetnosy: + serhiy.storchaka
messages: + msg237299
2014-06-16 02:33:58rhettingersetpriority: normal -> low
versions: - Python 3.4
nosy: + rhettinger, vstinner

messages: + msg220690
2014-06-15 22:12:24BreamoreBoysetversions: + Python 3.4, Python 3.5, - Python 3.0, Python 3.1
nosy: + BreamoreBoy

messages: + msg220684

type: performance
2010-05-20 20:35:11skip.montanarosetnosy: - skip.montanaro
2009-01-17 00:16:44blaisorbladesetmessages: + msg79990
2009-01-16 23:47:54collinwintersetmessages: + msg79989
2009-01-16 17:15:26jyasskinsetnosy: + collinwinter, jyasskin
2009-01-13 05:49:03blaisorbladesetnosy: + blaisorblade
2009-01-11 16:05:31skip.montanarosetmessages: + msg79599
2009-01-11 16:00:12skip.montanarosetmessages: + msg79598
2009-01-09 20:24:47skip.montanarosetmessages: + msg79501
2009-01-09 18:55:57pitrousetmessages: + msg79486
2009-01-09 18:40:10skip.montanarosetmessages: + msg79484
2009-01-09 18:31:21pitrousetnosy: + pitrou
messages: + msg79483
2009-01-09 17:53:26ajaksu2setfiles: + gcc_4.2.4_linux_ia32_bench.txt
nosy: + ajaksu2
messages: + msg79482
versions: + Python 3.0, Python 3.1
2009-01-09 14:31:52skip.montanarocreate