Author rhettinger
Recipients mark.dickinson, rhettinger, serhiy.storchaka, skrah, tim.peters, vstinner
Date 2015-12-11.21:44:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1449870259.01.0.351863599839.issue25823@psf.upfronthosting.co.za>
In-reply-to
Content
I verified that Clang and GCC both give the expected disassembly with Serhiy's patch.   We ought to restrict the #if to just the compilers that are known to optimize away the memcpy.

Clang (for 'BUILD_LIST_UNPACK')
-------------------------------
      .loc    10 2525 9               ## Python/ceval.c:2525:9
      movzwl  (%r13), %r9d
      addq    $2, %r13
  Ltmp2042:
      ##DEBUG_VALUE: PyEval_EvalFrameEx:next_instr <- R13

GCC (for 'BUILD_LIST_UNPACK')
----------------------------- 
  LM1275:
      movzwl  (%rdx), %r8d
  LVL1147:
      leaq    2(%rdx), %rbp

[Mark]
> Benchmarks showing dramatic real-world speed improvements ...

Much of the doubling of speed for core Python that has occurred over the last ten decade has occurred one little step at a time, none of the them being individually "dramatic".  In general, if we have a chance to reduce the work load in the ceval inner-loop, we should take it.

A simple benchmark on clang shows a roughly 10+% speedup in code exercising simple and common opcodes that that have a oparg (there is no point of benchmarking the effect on opcodes like IMPORT_NAME where the total eval-loop overhead is already an insignificant proportion of the total work).

Baseline version with CLANG Apple LLVM version 7.0.2 (clang-700.1.81)
  $ ./python.exe exercise_oparg.py 
  0.22484053499647416
  $ ./python.exe exercise_oparg.py 
  0.22687773499637842
  $ ./python.exe exercise_oparg.py 
  0.22026274001109414

Patched version with CLANG Apple LLVM version 7.0.2 (clang-700.1.81)
  $ ./python.exe exercise_oparg.py 
  0.19516360601119231
  $ ./python.exe exercise_oparg.py 
  0.20087355599389412
  $ ./python.exe exercise_oparg.py 
  0.1980393300036667

To better isolate the effect, I suppose you could enable the READ_TIMESTAMP macros to precisely measure the effect of converting five sequentially dependent instructions with two independent instructions, but likely all it would show you is that the two are cheaper than the five.
History
Date User Action Args
2015-12-11 21:44:19rhettingersetrecipients: + rhettinger, tim.peters, mark.dickinson, vstinner, skrah, serhiy.storchaka
2015-12-11 21:44:19rhettingersetmessageid: <1449870259.01.0.351863599839.issue25823@psf.upfronthosting.co.za>
2015-12-11 21:44:18rhettingerlinkissue25823 messages
2015-12-11 21:44:17rhettingercreate