|
msg45295 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-13 16:49 |
In ceval.c we find
/* XXX Perhaps we should create a specialized
PyFrame_New() that doesn't take locals, but does
take builtins without sanity checking them.
*/
This patch takes that idea rather further than you
might have expected... it creates a "light" subtype of
frame that assumes certain things about the frame,
gives this type its own free list (so it can assume
more about objects on the freelist) and converts light
frames into "heavy" frames when assumptions stop being
true.
Good for a ~5% improvement on "./python -s 'def f():
pass' 'f()'"; a bit less on pystone. It also conflicts
slightly with my function reorg patch -- apply that
first, apply this, ignore the reject and edit
func_caller_nofrees in funcobject.c to call
PyFrame_NewLight.
All three patches I just submitted together get ~6% on
pystone.
|
|
msg45296 - (view) |
Author: Jeremy Hylton (jhylton) |
Date: 2004-01-13 18:20 |
Logged In: YES
user_id=31392
I don't see any files attached.
|
|
msg45297 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-13 18:23 |
Logged In: YES
user_id=6656
sigh
|
|
msg45298 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-01-14 17:35 |
Logged In: YES
user_id=4771
(Side note first: I'm not sure 'builtins = back->f_builtins'
is right.)
Is the whole subclassing complexity worth the effort, given
that the invariants of light frames only seem to be that
four specific fields are null? Changing the type of an
object under Python code's feet is calling for troubles.
Moreover it is bound to break code that expect
'type(frame)==FrameType', even if such code can be
considered bad style.
Moreover it requires a number of hacks here and there --
e.g. you turn a light frame into a "heavy" frame when
f_trace is set; is it on purpose that you don't do it when
f_locals is set?
I cannot seem to get reliable performance results on my
machine, but maybe you want to compare with the attached
patch which speeds up the regular PyFrame_New by putting
stronger invariants on all the frames in the free_list.
|
|
msg45299 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-01-15 11:02 |
Logged In: YES
user_id=6656
I'm fairly sure this made a difference on my iBook; haven't
tried on x86.
It's possible that the correct response to this patch is to
add "... nah, not worth it" to the XXX comment in ceval.c...
|
|
msg45300 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-01-27 17:03 |
Logged In: YES
user_id=4771
Here is yet another try, which seems to perform better on my PentiumIII. I get the following speed improvements for this patch alone, for a loop calling an empty function:
zombie-frames.diff: 11.4% (PyStone 3.8%)
scary-frame-hacks.diff: 6.4% (PyStone 0.85%)
The idea is to get rid of the free_list and instead store the most recently finished ("zombie") frame in an internal field of the code object. This saves half of the frame creation overhead because half of the fields are already correct when the frame is reused, e.g. f_code, f_nlocals, f_stacksize, f_valuestack...
(you might need to cvs up frameobject.c before you can apply the patch)
|
|
msg45301 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-02-02 11:20 |
Logged In: YES
user_id=6656
Did I mention this idea to you or did you come up with it
independently? I forget...
I'll try to time stuff on my iBook tomorrow.
|
|
msg45302 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-02-02 13:04 |
Logged In: YES
user_id=4771
I guess the idea was just in the air, after your published attempts.
Ideally I'd have liked to have the cached frame depend on the globals as well as the code object itself; I considered moving the cache field to function objects. This way you also save the f_globals and f_builtins initialization. There were problems but maybe we should try harder.
|
|
msg45303 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 12:20 |
Logged In: YES
user_id=80475
Armin's second patch gives gives the expected speedups on a
Pentium3 running WinME, and the test suite runs without
exception. I recommend accepting and applying this patch as
is. Further improvements can be considered separately.
|
|
msg45304 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-03-01 12:23 |
Logged In: YES
user_id=6656
It slows recursive functions down a noticeable amount (did
this get noted anywhere? Maybe Armin & I just talked about
it on IRC), so that should be considered before this patch
is applied. But I think it's probably worth it, FWIW.
|
|
msg45305 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 13:25 |
Logged In: YES
user_id=80475
The effect on recursive functions could be mitigated by
restoring the freelist and falling back to it when
code->co_zombieframe == NULL.
I don't know if that is worth it. The current patch is a
code simplication as well as an optimization. Adding back
the freelist, adds a lot of clutter. Python is not
especially friendly to recursive functions anyway.
|
|
msg45306 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2004-03-01 13:59 |
Logged In: YES
user_id=4771
On a small recursive example it slows down from 2.64s to 3.26s.
This is a serious difference (20%). Is it bad enough to keep the
freelist ?
|
|
msg45307 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-03-01 15:50 |
Logged In: YES
user_id=80475
I think that's a question for python-dev.
|
|
msg45308 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2006-02-20 11:44 |
Logged In: YES
user_id=849994
Can this be reviewed for 2.5? The relevant discussion on
python-dev is at
http://mail.python.org/pipermail/python-dev/2004-March/042871.html.
|
|
msg45309 - (view) |
Author: Richard Jones (richard) *  |
Date: 2006-05-22 17:25 |
Logged In: YES
user_id=6405
Patch modified and applied to python2.5. Mods:
1. updated to python2.5
2. reinstated use of free list
See the "rjones-funccall" branch in SVN.
|
|
msg45310 - (view) |
Author: Tim Peters (tim_one) *  |
Date: 2006-05-23 11:16 |
Logged In: YES
user_id=31435
Closed as Accepted. While re-adding the free list removed
the code simplification benefit, the measurable x-platform
speedup is well worth getting.
|