This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Improved profiler
Type: Stage:
Components: Extension Modules Versions: Python 2.4
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: arigo, bdrosen, lcreighton, mwh, zseil
Priority: normal Keywords: patch

Created on 2005-06-01 16:05 by bdrosen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit bdrosen, 2005-06-01 16:06 zip file contains the new module and test script/results
profile.c bdrosen, 2005-09-08 15:25 Updated version of the file
Messages (11)
msg48399 - (view) Author: Brett Rosen (bdrosen) Date: 2005-06-01 16:05
I tried using both hotshot and the old python profiler
and found them both to be inadequate.

The old python profiler seems to give accurate results
for timing, but is slow enough that it adds significant
overhead to what it is measuing (approximately 10 times). 
Furthermore, it has no ability to give detailed stats
children. (how much of the cumulative time was taken up
by each function called by a function)

The hotshot profiler is much faster (profiling) adding
only 30% overhead. However, it is extremely slow to load
the results from the log file. It does not currently
support detailed child stats, although I imagine that
it could be made to do so using the information in the
log file. The biggest problem with it, is that the
time results seem to be highly inaccurate. (doesn't
correspond to actual seconds, although they seem to be 
proportionally wrong)

To address these shortcomings, I wrote a new profiling
module. It adds about the same overhead (30%) as hotshot,
but is much faster in retrieving results. It supports
detailed child stats and gives accurate timing information
in milliseconds. The accompanying .py module could use
additional work though - because of the child stats, I
was not able to reuse the stats module like hotshot does.

I've included a simple test script that runs pystone
for all 3 profilers (and without the profiler) to give
a better idea of the differences. I've also included
a dump of the output of the script running under Windows
XP with python 2.4.1
msg48400 - (view) Author: Brett Rosen (bdrosen) Date: 2005-09-08 15:25
Logged In: YES 

I've enclosed an updated version of profile.c that fixes
a problem in the original patch
msg48401 - (view) Author: Laura Creighton (lcreighton) * Date: 2005-09-21 15:18
Logged In: YES 

We've been using Hotshot at Strakt for a while now.  We
tried your patch and
it worked precisely as advertised.  Thank you for writing it.

Laura Creighton
msg48402 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-21 15:57
Logged In: YES 

Thanks for your work.  The two existing profilers are
definitely not satisfactory in my opinion too (I am also
encountering crashes with

There are a number of details that should be addressed
before your profiler can be a replacement for the existing
ones, e.g. providing simple entry points and documentations,
and having the C code reviewed.  I am willing to help with
all this.

I hope you won't mind that I have checked your source code
in a public Subversion repository, where I am working a bit
on it together with Michael Hudson.  (Obviously, the goal
is to have the code eventually in the CPython CVS.)
msg48403 - (view) Author: Brett Rosen (bdrosen) Date: 2005-09-21 17:43
Logged In: YES 

I welcome any additional changes (and help making those changes)
that would improve the patch and help get it ready for
inclusion in CPython. 

The main detail that I knew that would need to be addressed
was  I didn't spend a lot of time working on it
because I generally use a wx TreeListCtrl object to view/sort
the results. What other details need to be addressed?
msg48404 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2005-09-21 18:35
Logged In: YES 

Well, you can see what we've done to your baby:

Mostly it's just C style conformance so far, though we've fixed a couple of 
little bugs too.
msg48405 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-23 07:46
Logged In: YES 

I replaced the linked lists with some kind of auto-balacing
trees; the linked lists were creating a huge overhead to
profile large programs.  The reason for not using plain
Python dicts instead is that PyCodeObjects are not very
good at being keys in dicts -- their hash computation takes
ages, and we'd prefer an identity mapping anyway.

The current SVN version is now the first profiler that
works and gives sensible results when profiling the PyPy
translation process.

getstats() is now producing tuples-with-attribute-names
instead of dicts (similar to os.stat()).  This was mostly
motivated by a lack of motivation to introduce error
checking everywhere in the dict-building code, but I think
it's a reasonable change.  There are only a couple of
places left in profile.c still missing checks for error
results or out-of-memory conditions.

The module exposes a simple but minimalistic
interface.  I suggest we keep and document it or a similar
one, but also support -- for compatibility -- the
convoluted interface of the existing
and/or hotshot, with the option to dump the stats to a file
and reload them.  However I don't think it makes sense to
use exactly the same format as pstats does (as far as I
can guess it doesn't support per-caller information).
msg48406 - (view) Author: Brett Rosen (bdrosen) Date: 2005-09-26 13:54
Logged In: YES 

I looked over the changes so far and they look reasonable. I
did have a few questions though:

1 Do we not need to Increment/Decrement references to
the code objects? We are using them as keys in the trees as
well as payload data later on, but I don't see how we are
guaranteed that they won't be reaped. (although it seems

2 Is it deliberate to use lsprof.YYY style names for some
of the objects (ie lsprof.Profiler) instead of _lsprof ? (is
this the normal convention?)

3 Do you have a feel for the performance differences
of using the tree instead of the lists? Doing the simple
benchmark test they seemed to be comparable, but
that test is pretty simple. I'm assuming tht in a large
program, the tree approach will be considerably faster?

4 In, is there a reason that the Stats class
does not derive from object? 
msg48407 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-09-27 21:03
Logged In: YES 

1 The Py_INCREF(_code) at line 86 is the same as in
  your original code; it should guarantee that the
  code object doesn't go away.  However, I forgot
  the corresponding Py_DECREF()...

2 It was a quick hack to have help(lsprof) display
  these types as well.  Now I'm no longer sure that
  we need help(lsprof) to display them anyway, so
  let's use the standard '_lsprof.XXX' names.

3 Profiling a large program took forever.  I
  interrupted it after 30 minutes when it showed no
  sign of wanting to go past the initial step that
  normally takes only a few minutes.  With rotating
  trees this step is fast again.  Maybe they are not
  an optimal structure, though, because it still
  takes something like three times longer to finish
  the whole program (normally takes half an hour).

4 No.

Checked in your proposed changes.  What is still
missing: deciding how much similar to the and API we need to be, and
writing some tests (ideally, having good test
coverage would be nice).
msg48408 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-03-06 20:16
Can this patch be closed? Python 2.5 has a new
cProfile module, which AFAIK, was derived from
this patch.
msg48409 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-03-07 07:12
Yes, this profiler is now available as the cProfile
module of Python 2.5.

For older Python versions, it is not packaged but
can be checked out from my svn repository at .
Date User Action Args
2022-04-11 14:56:11adminsetgithub: 42041
2005-06-01 16:05:58bdrosencreate