Author jcea
Recipients Garen, belopolsky, danchr, dhduvall, dmalcolm, fche, glyph, hazmat, jbaker, jcea, jmcp, laca, lasizoillo, loewis, mjw, movement, neologix, pitrou, rhettinger, robert.kern, ronaldoussoren, scox, serverhorror, sirg3, techtonik, twleung, wsanchez
Date 2011-12-11.22:09:09
SpamBayes Score 1.13159e-12
Marked as misclassified No
Message-id <1323641352.99.0.50739691847.issue13405@psf.upfronthosting.co.za>
In-reply-to
Content
Performance report for current 3.3 tip, using "pybench".

******** BASELINE: (original version without DTRACE patch)

-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.3.0a0 (default:70ba352f9586, Dec 11 2011, 22:46:09) [GCC 4.6.2]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

Calibrating tests. Please wait... done.

Running 10 round(s) of the suite at warp factor 10:

* Round 1 done in 4.457 seconds.
* Round 2 done in 4.466 seconds.
* Round 3 done in 4.438 seconds.
* Round 4 done in 4.503 seconds.
* Round 5 done in 4.519 seconds.
* Round 6 done in 4.514 seconds.
* Round 7 done in 4.460 seconds.
* Round 8 done in 4.455 seconds.
* Round 9 done in 4.477 seconds.
* Round 10 done in 4.547 seconds.

-------------------------------------------------------------------------------
Benchmark: 2011-12-11 22:50:16
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    SunOS-5.10-i86pc-i386-32bit-ELF
       Processor:      i386
    
    Python:
       Implementation: CPython
       Executable:     /home/python/cpython/Tools/pybench/../../python
       Version:        3.3.0a0
       Compiler:       GCC 4.6.2
       Bits:           32bit
       Build:          Dec 11 2011 22:46:09 (#default:70ba352f9586)
       Unicode:        UCS4


Test                             minimum  average  operation  overhead
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:     79ms     82ms    0.16us    0.291ms
           BuiltinMethodLookup:     58ms     59ms    0.06us    0.342ms
                 CompareFloats:     58ms     59ms    0.05us    0.390ms
         CompareFloatsIntegers:    114ms    118ms    0.13us    0.304ms
               CompareIntegers:     89ms     91ms    0.05us    0.589ms
        CompareInternedStrings:     72ms     75ms    0.05us    1.519ms
                  CompareLongs:     47ms     49ms    0.05us    0.340ms
                CompareStrings:     58ms     59ms    0.06us    1.013ms
    ComplexPythonFunctionCalls:     68ms     70ms    0.35us    0.501ms
                 ConcatStrings:     71ms     73ms    0.15us    0.560ms
               CreateInstances:    109ms    113ms    1.01us    0.456ms
            CreateNewInstances:     82ms     84ms    1.00us    0.364ms
       CreateStringsWithConcat:    107ms    111ms    0.11us    0.991ms
                  DictCreation:     55ms     57ms    0.14us    0.391ms
             DictWithFloatKeys:     82ms     84ms    0.09us    0.739ms
           DictWithIntegerKeys:     65ms     66ms    0.06us    0.992ms
            DictWithStringKeys:     58ms     59ms    0.05us    0.991ms
                      ForLoops:     63ms     65ms    2.59us    0.044ms
                    IfThenElse:     74ms     76ms    0.06us    0.739ms
                   ListSlicing:     46ms     47ms    3.39us    0.053ms
                NestedForLoops:     85ms     88ms    0.06us    0.003ms
      NestedListComprehensions:     86ms     88ms    7.33us    0.096ms
          NormalClassAttribute:    151ms    155ms    0.13us    0.529ms
       NormalInstanceAttribute:     86ms     87ms    0.07us    0.552ms
           PythonFunctionCalls:     65ms     66ms    0.20us    0.292ms
             PythonMethodCalls:     97ms    100ms    0.44us    0.184ms
                     Recursion:    114ms    116ms    2.33us    0.494ms
                  SecondImport:     70ms     72ms    0.72us    0.199ms
           SecondPackageImport:     70ms     73ms    0.73us    0.192ms
         SecondSubmoduleImport:    123ms    126ms    1.26us    0.192ms
       SimpleComplexArithmetic:     66ms     68ms    0.08us    0.391ms
        SimpleDictManipulation:    108ms    111ms    0.09us    0.493ms
         SimpleFloatArithmetic:     66ms     68ms    0.05us    0.591ms
      SimpleIntFloatArithmetic:     82ms     84ms    0.06us    0.589ms
       SimpleIntegerArithmetic:     82ms     85ms    0.06us    0.603ms
      SimpleListComprehensions:     71ms     74ms    6.18us    0.096ms
        SimpleListManipulation:     63ms     64ms    0.05us    0.646ms
          SimpleLongArithmetic:     60ms     62ms    0.09us    0.291ms
                    SmallLists:     86ms     89ms    0.13us    0.408ms
                   SmallTuples:     85ms     88ms    0.16us    0.455ms
         SpecialClassAttribute:    226ms    234ms    0.20us    0.547ms
      SpecialInstanceAttribute:     86ms     89ms    0.07us    0.533ms
                StringMappings:    277ms    286ms    1.13us    0.422ms
              StringPredicates:     96ms    100ms    0.14us    1.991ms
                 StringSlicing:     95ms     98ms    0.18us    0.836ms
                     TryExcept:     53ms     54ms    0.02us    0.741ms
                    TryFinally:     60ms     62ms    0.39us    0.390ms
                TryRaiseExcept:     35ms     38ms    0.60us    0.408ms
                  TupleSlicing:     75ms     78ms    0.30us    0.046ms
                   WithFinally:     86ms     89ms    0.55us    0.394ms
               WithRaiseExcept:     92ms     96ms    1.19us    0.512ms
-------------------------------------------------------------------------------
Totals:                           4353ms   4484ms


****** VERSION CALCULATING LINENUMBER IN EVERY FUNCTION CALL:

-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.3.0a0 (dtrace-issue13405:552edf5be75c, Dec 11 2011, 22:57:28) [GCC 4.6.2]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

Calibrating tests. Please wait... done.

Running 10 round(s) of the suite at warp factor 10:

* Round 1 done in 4.880 seconds.
* Round 2 done in 4.876 seconds.
* Round 3 done in 4.899 seconds.
* Round 4 done in 4.887 seconds.
* Round 5 done in 4.872 seconds.
* Round 6 done in 4.913 seconds.
* Round 7 done in 4.888 seconds.
* Round 8 done in 4.865 seconds.
* Round 9 done in 4.888 seconds.
* Round 10 done in 4.864 seconds.

-------------------------------------------------------------------------------
Benchmark: 2011-12-11 22:58:51
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    SunOS-5.10-i86pc-i386-32bit-ELF
       Processor:      i386
    
    Python:
       Implementation: CPython
       Executable:     /home/python/cpython-jcea/Tools/pybench/../../python
       Version:        3.3.0a0
       Compiler:       GCC 4.6.2
       Bits:           32bit
       Build:          Dec 11 2011 22:57:28 (#dtrace-issue13405:552edf5be75c)
       Unicode:        UCS4


Test                             minimum  average  operation  overhead
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:    129ms    131ms    0.26us    0.295ms
           BuiltinMethodLookup:     57ms     58ms    0.06us    0.346ms
                 CompareFloats:     61ms     62ms    0.05us    0.413ms
         CompareFloatsIntegers:    117ms    118ms    0.13us    0.295ms
               CompareIntegers:     90ms     90ms    0.05us    0.596ms
        CompareInternedStrings:     74ms     74ms    0.05us    1.520ms
                  CompareLongs:     53ms     53ms    0.05us    0.345ms
                CompareStrings:     57ms     58ms    0.06us    1.015ms
    ComplexPythonFunctionCalls:     84ms     85ms    0.43us    0.503ms
                 ConcatStrings:    105ms    106ms    0.21us    0.556ms
               CreateInstances:    117ms    118ms    1.05us    0.459ms
            CreateNewInstances:     87ms     88ms    1.05us    0.366ms
       CreateStringsWithConcat:    112ms    113ms    0.11us    0.995ms
                  DictCreation:     56ms     57ms    0.14us    0.395ms
             DictWithFloatKeys:     85ms     85ms    0.09us    0.746ms
           DictWithIntegerKeys:     68ms     68ms    0.06us    0.996ms
            DictWithStringKeys:     62ms     62ms    0.05us    0.997ms
                      ForLoops:     66ms     66ms    2.66us    0.045ms
                    IfThenElse:     80ms     80ms    0.06us    0.746ms
                   ListSlicing:     46ms     46ms    3.30us    0.058ms
                NestedForLoops:     88ms     89ms    0.06us    0.003ms
      NestedListComprehensions:     87ms     88ms    7.34us    0.103ms
          NormalClassAttribute:    152ms    154ms    0.13us    0.528ms
       NormalInstanceAttribute:     88ms     88ms    0.07us    0.530ms
           PythonFunctionCalls:     92ms     93ms    0.28us    0.295ms
             PythonMethodCalls:    113ms    114ms    0.51us    0.177ms
                     Recursion:    133ms    133ms    2.67us    0.496ms
                  SecondImport:     75ms     81ms    0.81us    0.194ms
           SecondPackageImport:     79ms     83ms    0.83us    0.195ms
         SecondSubmoduleImport:    136ms    139ms    1.39us    0.195ms
       SimpleComplexArithmetic:     69ms     69ms    0.08us    0.398ms
        SimpleDictManipulation:    156ms    158ms    0.13us    0.517ms
         SimpleFloatArithmetic:     68ms     68ms    0.05us    0.596ms
      SimpleIntFloatArithmetic:     83ms     83ms    0.06us    0.595ms
       SimpleIntegerArithmetic:     82ms     83ms    0.06us    0.594ms
      SimpleListComprehensions:     72ms     75ms    6.24us    0.097ms
        SimpleListManipulation:     96ms     97ms    0.08us    0.648ms
          SimpleLongArithmetic:     61ms     62ms    0.09us    0.295ms
                    SmallLists:    105ms    107ms    0.16us    0.395ms
                   SmallTuples:     90ms     92ms    0.17us    0.445ms
         SpecialClassAttribute:    223ms    226ms    0.19us    0.532ms
      SpecialInstanceAttribute:     87ms     88ms    0.07us    0.536ms
                StringMappings:    293ms    294ms    1.17us    0.426ms
              StringPredicates:    144ms    145ms    0.21us    1.833ms
                 StringSlicing:     94ms     95ms    0.17us    0.832ms
                     TryExcept:     59ms     59ms    0.03us    0.746ms
                    TryFinally:     89ms     90ms    0.56us    0.396ms
                TryRaiseExcept:     40ms     41ms    0.64us    0.395ms
                  TupleSlicing:     76ms     77ms    0.29us    0.042ms
                   WithFinally:     89ms     91ms    0.57us    0.397ms
               WithRaiseExcept:    101ms    103ms    1.29us    0.496ms
-------------------------------------------------------------------------------
Totals:                           4824ms   4883ms


****** CURRENT DTRACE PATCH, sacrifying memory for speed:

-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.3.0a0 (dtrace-issue13405:43d1a819a63d, Dec 11 2011, 22:40:40) [GCC 4.6.2]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

Calibrating tests. Please wait... done.

Running 10 round(s) of the suite at warp factor 10:

* Round 1 done in 4.581 seconds.
* Round 2 done in 4.527 seconds.
* Round 3 done in 4.591 seconds.
* Round 4 done in 4.563 seconds.
* Round 5 done in 4.516 seconds.
* Round 6 done in 4.528 seconds.
* Round 7 done in 4.555 seconds.
* Round 8 done in 4.535 seconds.
* Round 9 done in 4.524 seconds.
* Round 10 done in 4.528 seconds.

-------------------------------------------------------------------------------
Benchmark: 2011-12-11 22:53:33
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    SunOS-5.10-i86pc-i386-32bit-ELF
       Processor:      i386
    
    Python:
       Implementation: CPython
       Executable:     /home/python/cpython-jcea/Tools/pybench/../../python
       Version:        3.3.0a0
       Compiler:       GCC 4.6.2
       Bits:           32bit
       Build:          Dec 11 2011 22:40:40 (#dtrace-issue13405:43d1a819a63d)
       Unicode:        UCS4


Test                             minimum  average  operation  overhead
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:     78ms     79ms    0.16us    0.304ms
           BuiltinMethodLookup:     57ms     57ms    0.05us    0.356ms
                 CompareFloats:     60ms     61ms    0.05us    0.408ms
         CompareFloatsIntegers:    115ms    116ms    0.13us    0.304ms
               CompareIntegers:     90ms     91ms    0.05us    0.614ms
        CompareInternedStrings:     75ms     75ms    0.05us    1.552ms
                  CompareLongs:     53ms     54ms    0.05us    0.356ms
                CompareStrings:     57ms     59ms    0.06us    1.040ms
    ComplexPythonFunctionCalls:     74ms     76ms    0.38us    0.513ms
                 ConcatStrings:     70ms     73ms    0.15us    0.564ms
               CreateInstances:    113ms    116ms    1.04us    0.464ms
            CreateNewInstances:     85ms     87ms    1.03us    0.372ms
       CreateStringsWithConcat:    110ms    113ms    0.11us    1.030ms
                  DictCreation:     59ms     60ms    0.15us    0.406ms
             DictWithFloatKeys:     85ms     87ms    0.10us    0.770ms
           DictWithIntegerKeys:     70ms     70ms    0.06us    1.029ms
            DictWithStringKeys:     62ms     63ms    0.05us    1.029ms
                      ForLoops:     67ms     67ms    2.69us    0.046ms
                    IfThenElse:     79ms     80ms    0.06us    0.770ms
                   ListSlicing:     46ms     46ms    3.30us    0.057ms
                NestedForLoops:     88ms     89ms    0.06us    0.003ms
      NestedListComprehensions:     88ms     90ms    7.46us    0.100ms
          NormalClassAttribute:    155ms    157ms    0.13us    0.537ms
       NormalInstanceAttribute:     88ms     89ms    0.07us    0.540ms
           PythonFunctionCalls:     71ms     72ms    0.22us    0.305ms
             PythonMethodCalls:     98ms    100ms    0.44us    0.180ms
                     Recursion:    123ms    125ms    2.50us    0.512ms
                  SecondImport:     72ms     73ms    0.73us    0.210ms
           SecondPackageImport:     72ms     74ms    0.74us    0.210ms
         SecondSubmoduleImport:    126ms    129ms    1.29us    0.210ms
       SimpleComplexArithmetic:     68ms     69ms    0.08us    0.408ms
        SimpleDictManipulation:    117ms    118ms    0.10us    0.516ms
         SimpleFloatArithmetic:     66ms     67ms    0.05us    0.615ms
      SimpleIntFloatArithmetic:     82ms     82ms    0.06us    0.615ms
       SimpleIntegerArithmetic:     82ms     84ms    0.06us    0.614ms
      SimpleListComprehensions:     73ms     75ms    6.24us    0.104ms
        SimpleListManipulation:     64ms     65ms    0.06us    0.666ms
          SimpleLongArithmetic:     61ms     62ms    0.09us    0.304ms
                    SmallLists:     87ms     89ms    0.13us    0.406ms
                   SmallTuples:     84ms     86ms    0.16us    0.459ms
         SpecialClassAttribute:    228ms    234ms    0.19us    0.541ms
      SpecialInstanceAttribute:     88ms     90ms    0.08us    0.542ms
                StringMappings:    277ms    280ms    1.11us    0.434ms
              StringPredicates:     94ms     95ms    0.14us    1.868ms
                 StringSlicing:     94ms     96ms    0.17us    0.849ms
                     TryExcept:     58ms     59ms    0.03us    0.771ms
                    TryFinally:     64ms     66ms    0.41us    0.408ms
                TryRaiseExcept:     38ms     38ms    0.60us    0.408ms
                  TupleSlicing:     75ms     76ms    0.29us    0.043ms
                   WithFinally:     88ms     90ms    0.56us    0.408ms
               WithRaiseExcept:     97ms     99ms    1.24us    0.512ms
-------------------------------------------------------------------------------
Totals:                           4470ms   4545ms


The previous version that calculates linenumbers in each function call is about 9% slower that stock python. Sacrifying a bit of memory (O(2*n), with n is the size of bytecodes imported), the speed difference with stock python is 1.4%.

I think that paying 1.4% of performance hit is a good investment to enjoy DTrace probes in Python, being able to run an instrumentalized Python interpreter 100% of the time. In fact this instrumentalization can be used to locate hotspots in python interpreter and maybe improve overall performance.

Of course, platform with no dtrace or with dtrace support disabled when compiling python will no have any kind of performance hit.
History
Date User Action Args
2011-12-11 22:09:13jceasetrecipients: + jcea, loewis, rhettinger, ronaldoussoren, belopolsky, pitrou, wsanchez, movement, techtonik, serverhorror, glyph, laca, twleung, jbaker, robert.kern, sirg3, danchr, dhduvall, dmalcolm, mjw, Garen, neologix, lasizoillo, fche, hazmat, jmcp, scox
2011-12-11 22:09:12jceasetmessageid: <1323641352.99.0.50739691847.issue13405@psf.upfronthosting.co.za>
2011-12-11 22:09:12jcealinkissue13405 messages
2011-12-11 22:09:09jceacreate