Author skip.montanaro
Recipients ajaksu2, alexandre.vassalotti, bboissin, blaisorblade, christian.heimes, djc, facundobatista, lemburg, pitrou, rhettinger, skip.montanaro, theatrus
Date 2009-01-04.13:14:14
SpamBayes Score 3.02385e-07
Marked as misclassified No
Message-id <1231074857.79.0.91001529339.issue4753@psf.upfronthosting.co.za>
In-reply-to
Content
I'm sure this is the wrong place to bring this up, but I had a
thought about simple JIT compilation coupled with the opcode
dispatch changes in this issue.

Consider this silly function:

    >>> def f(a, b):
    ...   result = 0
    ...   while b:
    ...     result += a
    ...     b -= 1
    ...   return result
    ... 

which compiles to

      2           0 LOAD_CONST               1 (0)
		  3 STORE_FAST               2 (result)

      3           6 SETUP_LOOP              32 (to 41)
	    >>    9 LOAD_FAST                1 (b)
		 12 JUMP_IF_FALSE           24 (to 39)
		 15 POP_TOP             

      4          16 LOAD_FAST                2 (result)
		 19 LOAD_FAST                0 (a)
		 22 INPLACE_ADD         
		 23 STORE_FAST               2 (result)

      5          26 LOAD_FAST                1 (b)
		 29 LOAD_CONST               2 (1)
		 32 INPLACE_SUBTRACT    
		 33 STORE_FAST               1 (b)
		 36 JUMP_ABSOLUTE            9
	    >>   39 POP_TOP             
		 40 POP_BLOCK           

      6     >>   41 LOAD_FAST                2 (result)
		 44 RETURN_VALUE        

What if you built and compiled a "Mini Me" version of
PyEval_EvalFrameEx on-the-fly which only contained the prologue and
epilog of the real function and a small switch statement which only
knew about the the byte-code instructions used by f()?  Would the
compiler be better able to optimize the code?  Would the
instructions' placement nearer to each other provide better cache
behavior?  Would branch prediction by CPU be improved?

Another possibility would be to eliminate the for(;;) ... switch
altogether and just inline the code for the individual instructions.
It would help if the body of each bytecode instruction was
implemented as a macro, e.g.:

    #define _POP_TOP() \
	PREDICTED(POP_TOP); \
	TARGET(POP_TOP) \
	v = POP(); \
	Py_DECREF(v); \
	FAST_DISPATCH();

The above function could (lots of hand-waving here) be "compiled" to
something like

    PyObject *
    _MiniMe(PyFrameObject *f, int throwflag)
    {
	_PyEVAL_FRAMEEX_PROLOG

	_LOAD_CONST(1)
	_STORE_FAST(2)
	_SETUP_LOOP(_41)
    _9:
	_LOAD_FAST(1)
	_JUMP_IF_FALSE(_39)
	_POP_TOP()
	_LOAD_FAST(2)
	_LOAD_FAST(0)
	_INPLACE_ADD()
	_STORE_FAST(2)
    _26:
	_LOAD_FAST(1)
	_LOAD_CONST(2)
	_INPLACE_SUBTRACT()
	_STORE_FAST(1)
	_JUMP_ABSOLUTE(_9)
    _39:
	_POP_TOP()
	_POP_BLOCK()
	_LOAD_FAST(2)
	_RETURN_VALUE()

	_PyEVAL_FRAMEEX_EPILOG
    }

and the resulting binary code saved as an attribute of the code
object.  Presumably there would be some decision made about whether
to compile a function into this form (maybe only after it's been
called N times?).
History
Date User Action Args
2009-01-04 13:14:18skip.montanarosetrecipients: + skip.montanaro, lemburg, rhettinger, facundobatista, pitrou, christian.heimes, ajaksu2, alexandre.vassalotti, djc, bboissin, blaisorblade, theatrus
2009-01-04 13:14:17skip.montanarosetmessageid: <1231074857.79.0.91001529339.issue4753@psf.upfronthosting.co.za>
2009-01-04 13:14:16skip.montanarolinkissue4753 messages
2009-01-04 13:14:14skip.montanarocreate