Message 50158 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nnorwitz
Recipients
Date	2006-05-01.06:58:24
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Results: 2.86% for 1 arg (len), 11.8% for 2 args (min), and 1.6% for pybench. trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.74 msec per loop trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 8.03 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.88 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 9.09 msec per loop pybench goes from 5688.00 down to 5598.00 Details about the patch: There are 2 unrelated changes. They both seem to provide equal benefits for calling varargs C. One is very simple and just inlines calling a varargs C function rather than calling PyCFunction_Call() which does extra checks that are already known. This moves meth and self up one block. and breaks the C_TRACE into 2. (When looking at the patch, this will make sense I hope.) The other change is more dangerous. It modifies load_args() to hold on to tuples so they aren't allocated and deallocated. The initialization is done one time in the new func _PyEval_Init(). It allocates 64 tuples of size 8 that are never deallocated. The idea is that there won't be usually be more than 64 frames with 8 or less parameters active on the stack at any one time (stack depth). There are cases where this can degenerate, but for the most part, it should only be marginally slower, but generally this should be a fair amount faster by skipping the alloc and dealloc and some extra work. My decrementing the _last_index inside the needs_free blocks, that could improve behaviour. This really needs comments added to the code. But I'm not gonna get there tonight. I'd be interested in comments about the code.

Results:  2.86% for 1 arg (len), 11.8% for 2 args
(min), and 1.6% for pybench.

trunk-speed$ ./python.exe -m timeit 'for x in
xrange(10000): len([])'
100 loops, best of 3: 4.74 msec per loop
trunk-speed$ ./python.exe -m timeit 'for x in
xrange(10000): min(1,2)'
100 loops, best of 3: 8.03 msec per loop

trunk-clean$ ./python.exe -m timeit 'for x in
xrange(10000): len([])'
100 loops, best of 3: 4.88 msec per loop
trunk-clean$ ./python.exe -m timeit 'for x in
xrange(10000): min(1,2)'
100 loops, best of 3: 9.09 msec per loop

pybench goes from 5688.00 down to 5598.00


Details about the patch:

There are 2 unrelated changes.  They both seem to
provide equal benefits for calling varargs C.  One is
very simple and just inlines calling a varargs C
function rather than calling PyCFunction_Call() which
does extra checks that are already known.  This moves
meth and self up one block. and breaks the C_TRACE into
2.  (When looking at the patch, this will make sense I
hope.)

The other change is more dangerous.  It modifies
load_args() to hold on to tuples so they aren't
allocated and deallocated.  The initialization is done
one time in the new func _PyEval_Init().

It allocates 64 tuples of size 8 that are never
deallocated.  The idea is that there won't be usually
be more than 64 frames with 8 or less parameters active
on the stack at any one time (stack depth).  There are
cases where this can degenerate, but for the most part,
it should only be marginally slower, but generally this
should be a fair amount faster by skipping the alloc
and dealloc and some extra work.  My decrementing the
_last_index inside the needs_free blocks, that could
improve behaviour.

This really needs comments added to the code.  But I'm
not gonna get there tonight.  I'd be interested in
comments about the code.

History
Date	User	Action	Args
2007-08-23 15:48:37	admin	link	issue1479611 messages
2007-08-23 15:48:37	admin	create