Author sbrunthaler
Recipients cvrebert, dmalcolm, eric.snow, pitrou, sbrunthaler, skrah
Date 2012-05-16.16:37:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CA+j1x0nH7v3CsdpUjnA=1LXGrwH8LU1MDjr8WLBDnnobSGTJpA@mail.gmail.com>
In-reply-to <1337178797.96.0.896388947076.issue14757@psf.upfronthosting.co.za>
Content
> Perhaps that's just me, but I find the performance gains rather limited given the sheer size of the changes.

Well there are a couple of things to keep in mind:

a) There is a substantial speedup potential in further interpretative
optimizations, but they come at increased complexity (mostly due to a
different instruction encoding). From the response on python-dev I
took away that this is not what people want.

b) The size is deceptive: the patch contains all resources, i.e., the
code gen *and* the generated files. I could split it up into three
separate patches to show that the *actual* intersection with existing
Python sources is very small. (Disregarding opcode.h, my guess is that
it's about a 100 lines.)

c) There are no reasonable compatbility implications (modulo code that
checks specific opcode values) and the memory consumption is
essentially nil (<= 100KiB, constant.)

There are further speedups available by ordering the interpreter
instructions (I have a paper on that called "Interpreter Instruction
Scheduling", and am currently working on a better algorithm [well, the
algorithm already exists, I'm just evaluating it].) I could easily add
that at no extra cost to the implementation, too.

> Is there any non-micro benchmark where the performance gains are actually substantial (say, more than 20%)?

Hm, I don't know. Are there applications/frameworks running on Python
3 that I can benchmark with?

Based on my experience, the speedups should be achievable across the
board, primarily because the most frequent CALL_FUNCTION instructions
have optimized derivatives. In addition with the arithmetic and
COMPARE_OP derivatives this covers a wide array of dynamic instruction
frequency mixes. There exist further inlining capabilities, too, which
can be easily added to the code generator.
The only reason why some benchmarks don't achieve expected speedups
isdue to them using operations where the code-gen does not contain
optimized derivatives. There is still space for ~45 derivatives to
cover those (including some important application-specific ones.)
History
Date User Action Args
2012-05-16 16:37:53sbrunthalersetrecipients: + sbrunthaler, pitrou, cvrebert, skrah, dmalcolm, eric.snow
2012-05-16 16:37:51sbrunthalerlinkissue14757 messages
2012-05-16 16:37:50sbrunthalercreate