Message 43171 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	twouters
Recipients
Date	2003-04-18.22:22:11
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=34209 Alright, here is a re-worked patch, with a toggle to choose between a blatant copy-paste and some refactoring; see below. The patch works by creating a new opcode, CALL_ATTR, which is used for all <expression>.<name>(<args>) occurances. What <expression> and <args> are, is not important, they are compiled separately. The CALL_ATTR opcode implementation is optimized for two special cases: one where <expression> resulted in an (old-style) instance, and one where <expression> resulted in an instance of a new-style type of which the tp_getattro is PyObject_GenericGetAttr. The PyInstance part is done by savagely short-cutting the usual getattr dance for instances; if it sees anything but a PyFunction, it will fall back to a slow path. The rationale is that if X in 'X.spam(' is an old-style class, and that expression is not going to raise an exception, it is very rare for 'spam' to be anything but a PyFunction. Trying to cope with all the idiosyncracies of would slow down the common case too much. The PyObject_GenericGetAttr version uses a slightly modified version of PyObject_GenericGetAttr that, when finding a descr of the desired name, doesn't call the 'descr_get' method but returns a status flag. The caller (call_attr) then decides based on the type of the descr whether to call descr_get or not. It currently only specialcases PyFunctions. PyCFunctions, PyStaticMethods and PyClassMethods are tricky to specialcase and/or need some of the massaging that descr_get would do. I have not yet looked at other callable descr's. I had initially rewritten PyObject_GenericGetAttr() to use the modified version, but this appears to be a significant performance hit in normal attribute retrieval (quite common, on newstyle classes.) Likewise, Brett and I had refactored the call_function part of call_attr and call_function into a separate function, but that, too, was a big hit in the common function-calling case. Unfortunately, not doing that refactoring means a lot of copied code, so I included both in the patch. It may be that the slow path can be optimized by simplyfying the refactored parts so that the compiler understands how to inline them (e.g. the stackpointer fudging call_function/call_callable does.) The default is the ugly-but-fast way, define CALL_ATTR_SLOW_BUT_PRETTY_PATH to use the slow(er) path. The slow(er) path is enough slower to nullify the benefit of the patch in most of the benchmarks I ran; the fast path is only slightly slower in some areas (probably due to cache dynamics) but faster in every other situations, including unexpected areas (that's not cache dynamics, of course, that's just coder brilliance. :-) However, finding a good benchmark is near impossible. I added some newstyle-classes tests to PyBench, but even normal tests were giving bizarrely negative results. Checking those results with small scripts of timeit.py showed entirely different results. And when pybench reported a total 2% slowdown in the 'slow path' new code, it managed to report that about 5% faster, consistently. timeit.py is more consistent, and helped me determine the 'slow path' was really slowing things down. Calling an empty method with no arguments is about 20% faster for newstyle classes and about 30% for oldstyle classes, according to timeit.py. Still no test for call_attr though. I would love for people to test the code, both paths, and give me input. I also welcome ideas on handling more descr's, I may have missed a few unwritten rules about them.

Logged In: YES 
user_id=34209

Alright, here is a re-worked patch, with a toggle to choose
between a blatant copy-paste and some refactoring; see below.

The patch works by creating a new opcode, CALL_ATTR, which
is used for all <expression>.<name>(<args>) occurances. What
<expression> and <args> are, is not important, they are
compiled separately.

The CALL_ATTR opcode implementation is optimized for two
special cases: one where <expression> resulted in an
(old-style) instance, and one where <expression> resulted in
an instance of a new-style type of which the tp_getattro is
PyObject_GenericGetAttr.

The PyInstance part is done by savagely short-cutting the
usual getattr dance for instances; if it sees anything but a
PyFunction, it will fall back to a slow path. The rationale
is that if X in 'X.spam(' is an old-style class, and that
expression is not going to raise an exception, it is very
rare for 'spam' to be anything but a PyFunction. Trying to
cope with all the idiosyncracies of would slow down the
common case too much.

The PyObject_GenericGetAttr version uses a slightly modified
version of PyObject_GenericGetAttr that, when finding a
descr of the desired name, doesn't call the 'descr_get'
method but returns a status flag. The caller (call_attr)
then decides based on the type of the descr whether to call
descr_get or not. It currently only specialcases
PyFunctions. PyCFunctions, PyStaticMethods and
PyClassMethods are tricky to specialcase and/or need some of
the massaging that descr_get would do. I have not yet looked
at other callable descr's.

I had initially rewritten PyObject_GenericGetAttr() to use
the modified version, but this appears to be a significant
performance hit in normal attribute retrieval (quite common,
on newstyle classes.) Likewise, Brett and I had refactored
the call_function part of call_attr and call_function into a
separate function, but that, too, was a big hit in the
common function-calling case. Unfortunately, not doing that
refactoring means a lot of copied code, so I included both
in the patch. It may be that the slow path can be optimized
by simplyfying the refactored parts so that the compiler
understands how to inline them (e.g. the stackpointer
fudging call_function/call_callable does.) 

The default is the ugly-but-fast way, define
CALL_ATTR_SLOW_BUT_PRETTY_PATH to use the slow(er) path. The
slow(er) path is enough slower to nullify the benefit of the
patch in most of the benchmarks I ran; the fast path is only
slightly slower in some areas (probably due to cache
dynamics) but faster in every other situations, including
unexpected areas (that's not cache dynamics, of course,
that's just coder brilliance. :-)

However, finding a good benchmark is near impossible. I
added some newstyle-classes tests to PyBench, but even
normal tests were giving bizarrely negative results.
Checking those results with small scripts of timeit.py
showed entirely different results. And when pybench reported
a total 2% slowdown in the 'slow path' new code, it managed
to report that about 5% faster, consistently. timeit.py is
more consistent, and helped me determine the 'slow path' was
really slowing things down. Calling an empty method with no
arguments is about 20% faster for newstyle classes and about
30% for oldstyle classes, according to timeit.py.

Still no test for call_attr though.

I would love for people to test the code, both paths, and
give me input. I also welcome ideas on handling more
descr's, I may have missed a few unwritten rules about them.

History
Date	User	Action	Args
2007-08-23 15:21:36	admin	link	issue709744 messages
2007-08-23 15:21:36	admin	create