Message43171
Logged In: YES
user_id=34209
Alright, here is a re-worked patch, with a toggle to choose
between a blatant copy-paste and some refactoring; see below.
The patch works by creating a new opcode, CALL_ATTR, which
is used for all <expression>.<name>(<args>) occurances. What
<expression> and <args> are, is not important, they are
compiled separately.
The CALL_ATTR opcode implementation is optimized for two
special cases: one where <expression> resulted in an
(old-style) instance, and one where <expression> resulted in
an instance of a new-style type of which the tp_getattro is
PyObject_GenericGetAttr.
The PyInstance part is done by savagely short-cutting the
usual getattr dance for instances; if it sees anything but a
PyFunction, it will fall back to a slow path. The rationale
is that if X in 'X.spam(' is an old-style class, and that
expression is not going to raise an exception, it is very
rare for 'spam' to be anything but a PyFunction. Trying to
cope with all the idiosyncracies of would slow down the
common case too much.
The PyObject_GenericGetAttr version uses a slightly modified
version of PyObject_GenericGetAttr that, when finding a
descr of the desired name, doesn't call the 'descr_get'
method but returns a status flag. The caller (call_attr)
then decides based on the type of the descr whether to call
descr_get or not. It currently only specialcases
PyFunctions. PyCFunctions, PyStaticMethods and
PyClassMethods are tricky to specialcase and/or need some of
the massaging that descr_get would do. I have not yet looked
at other callable descr's.
I had initially rewritten PyObject_GenericGetAttr() to use
the modified version, but this appears to be a significant
performance hit in normal attribute retrieval (quite common,
on newstyle classes.) Likewise, Brett and I had refactored
the call_function part of call_attr and call_function into a
separate function, but that, too, was a big hit in the
common function-calling case. Unfortunately, not doing that
refactoring means a lot of copied code, so I included both
in the patch. It may be that the slow path can be optimized
by simplyfying the refactored parts so that the compiler
understands how to inline them (e.g. the stackpointer
fudging call_function/call_callable does.)
The default is the ugly-but-fast way, define
CALL_ATTR_SLOW_BUT_PRETTY_PATH to use the slow(er) path. The
slow(er) path is enough slower to nullify the benefit of the
patch in most of the benchmarks I ran; the fast path is only
slightly slower in some areas (probably due to cache
dynamics) but faster in every other situations, including
unexpected areas (that's not cache dynamics, of course,
that's just coder brilliance. :-)
However, finding a good benchmark is near impossible. I
added some newstyle-classes tests to PyBench, but even
normal tests were giving bizarrely negative results.
Checking those results with small scripts of timeit.py
showed entirely different results. And when pybench reported
a total 2% slowdown in the 'slow path' new code, it managed
to report that about 5% faster, consistently. timeit.py is
more consistent, and helped me determine the 'slow path' was
really slowing things down. Calling an empty method with no
arguments is about 20% faster for newstyle classes and about
30% for oldstyle classes, according to timeit.py.
Still no test for call_attr though.
I would love for people to test the code, both paths, and
give me input. I also welcome ideas on handling more
descr's, I may have missed a few unwritten rules about them.
|
|
Date |
User |
Action |
Args |
2007-08-23 15:21:36 | admin | link | issue709744 messages |
2007-08-23 15:21:36 | admin | create | |
|