Author jcea
Recipients Garen, belopolsky, benjamin.peterson, danchr, dhduvall, dmalcolm, fche, glyph, hazmat, jbaker, jcea, jmcp, laca, lasizoillo, loewis, mjw, movement, neologix, pitrou, rhettinger, robert.kern, ronaldoussoren, scox, serverhorror, sirg3, techtonik, twleung, wsanchez
Date 2012-01-21.00:03:40
SpamBayes Score 1.11022e-16
Marked as misclassified No
Message-id <1327104223.53.0.217811935108.issue13405@psf.upfronthosting.co.za>
In-reply-to
Content
My point about DTrace probes is not performance per-se but observability.

An OS with DTrace support can be completely inspected and shows you ALL the inner working. You can see a particular pattern in the operating system and locate the syscalls and the process involved. If it is a compiled program, you can actually look inside it and locate the particular function responsible and, even, its arguments.

You can not do that easily with an interpreted program, because when you inspect the stacktrace, you see the C interpreter functions, not the actual interpreted functions running. So, to be appropriately integrated with the impressive observability of the entire system, the interpreter must cooperate. That is the role of this patch. And that role is crucial.

Just an example I was checking today, just for fun: an email server written in Python.

I am interested in knowing which function is involved in database "misses", and so real READ harddisk activity. Since Berkeley DB is used as the backend, and Oracle has included dtrace probes on the library (http://docs.oracle.com/cd/E17076_02/html/programmer_reference/program_perfmon.html), we can do something like this:

"""
dtrace -n "bdb1066:::mpool-miss {@misses[jstack(20)]=count();}"
"""

This on-liner will trace process "1066" and with plug into the Berkeley DB cache miss event, recording the python stack (up to 20 levels) at each event fire. We let the monitor script go for a few seconds and ^c it, showing something like:

"""
dtrace: description 'bdb1066:::mpool-miss ' matched 1 probe
^C


              libdb-5.3.so`__memp_fget+0x22cb
              libdb-5.3.so`__ham_get_cpage+0x140
              libdb-5.3.so`__ham_lookup+0x8b
              libdb-5.3.so`__hamc_get+0x3d5
              libdb-5.3.so`__dbc_iget+0x40d
              libdb-5.3.so`__dbc_get+0x9d
              libdb-5.3.so`__db_get+0xb8
              libdb-5.3.so`__db_get_pp+0x2db
              _pybsddb.so`DB_get+0x1f5
              libpython2.7.so.1.0`PyCFunction_Call+0x148
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x55f2
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:317 (_prefetch_thread) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x18f
              libpython2.7.so.1.0`PyObject_Call+0x5c
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x1d8e
                [ /usr/local/lib/python2.7/threading.py:505 (run) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ /usr/local/lib/python2.7/threading.py:552 (__bootstrap_inner) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ /usr/local/lib/python2.7/threading.py:525 (__bootstrap) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x9a
              libpython2.7.so.1.0`PyObject_Call+0x5c
                1

              libdb-5.3.so`__memp_fget+0x22cb
              libdb-5.3.so`__ham_get_cpage+0x140
              libdb-5.3.so`__ham_lookup+0x8b
              libdb-5.3.so`__hamc_get+0x3d5
              libdb-5.3.so`__dbc_iget+0x40d
              libdb-5.3.so`__dbc_get+0x9d
              libdb-5.3.so`__db_get+0xb8
              libdb-5.3.so`__db_get_pp+0x2db
              _pybsddb.so`DB_get+0x1f5
              libpython2.7.so.1.0`PyCFunction_Call+0x148
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x55f2
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:725 (load) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x18f
              libpython2.7.so.1.0`PyObject_Call+0x5c
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x1d8e
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:430 (f) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x4f89
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:302 (_send_load_response) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:295 (handle_L) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:261 (handle) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:227 (serve) ]
                2

              libdb-5.3.so`__memp_fget+0x22cb
              libdb-5.3.so`__ham_get_cpage+0x140
              libdb-5.3.so`__ham_lookup+0x8b
              libdb-5.3.so`__hamc_get+0x3d5
              libdb-5.3.so`__dbc_iget+0x40d
              libdb-5.3.so`__dbc_get+0x9d
              libdb-5.3.so`__db_get+0xb8
              libdb-5.3.so`__db_get_pp+0x2db
              _pybsddb.so`DB_get+0x1f5
              libpython2.7.so.1.0`PyCFunction_Call+0x148
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x55f2
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:307 (_prefetch_thread) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x18f
              libpython2.7.so.1.0`PyObject_Call+0x5c
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x1d8e
                [ /usr/local/lib/python2.7/threading.py:505 (run) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ /usr/local/lib/python2.7/threading.py:552 (__bootstrap_inner) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ /usr/local/lib/python2.7/threading.py:525 (__bootstrap) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x9a
              libpython2.7.so.1.0`PyObject_Call+0x5c
               25

              libdb-5.3.so`__memp_fget+0x22cb
              libdb-5.3.so`__ham_get_cpage+0x140
              libdb-5.3.so`__ham_lookup+0x8b
              libdb-5.3.so`__hamc_get+0x3d5
              libdb-5.3.so`__dbc_iget+0x40d
              libdb-5.3.so`__dbc_get+0x9d
              libdb-5.3.so`__db_get+0xb8
              libdb-5.3.so`__db_get_pp+0x2db
              _pybsddb.so`DB_get+0x1f5
              libpython2.7.so.1.0`PyCFunction_Call+0x148
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x53ee
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:758 (end) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`function_call+0x18f
              libpython2.7.so.1.0`PyObject_Call+0x5c
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x1d8e
                [ /export/home/correo/durus-berkeleydbstorage/berkeleydb_storage.py:444 (f) ]
              libpython2.7.so.1.0`PyEval_EvalCodeEx+0x7a4
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x4f89
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:354 (handle_C) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:261 (handle) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ build/bdist.solaris-2.10-i86pc/egg/durus/storage_server.py:227 (serve) ]
              libpython2.7.so.1.0`PyEval_EvalFrameEx+0x63ed
                [ /export/home/correo/lmtp.py:1206 (storage_background) ]
               72
"""

We can see which function (even which thread) is causing the database miss. The number after each frame shows how many times we saw that particular stacktrace.

Let's suppose now I am interested in knowing how long SMTP transactions are. Let's do something like this:

"""
python$target:::function-entry
/copyinstr(arg1)=="lmtp2"/
{
  self->ts = timestamp;
}

python$target:::function-return
/(copyinstr(arg1)=="lmtp2") && (self->ts)/
{
  @time_lmtp = quantize(timestamp-self->ts);
  self->ts = 0;
}

"""

And the result after a few seconds: (time in nanoseconds)

"""
           value  ------------- Distribution ------------- count    
         8388608 |                                         0        
        16777216 |@@@@@@@@@@@@@@@@@                        8        
        33554432 |@@@@@@@@@@@@@@@@@@@@@@@                  11       
        67108864 |                                         0 
"""

(The values are pessimistic, because the activated probes have performance hit).

You can do things like monitor garbage collection events, count object allocation/deallocation to find memory leaks, relate network activity with disk activity, whatever you can imagine. And not only inside the program, but connect any activity in the system with the root cause.

Another example. In this particular program the database runs in a separate thread inside the same process, connected to the other threads via a socket. Sometimes the socket can timeout if the database is being slow. In that case, the thread with the timeout will close the connection to the database and will open a new connection, redoing the transaction (the system is ACID, so we can do this). How often is that happening?. No idea. I could change python code to log reconnections or I could run an one-liner dtrace script on my server hooking to the reconnection routine to get an inmediate feedback of how frequent is this event firing.

Note that during all this detailed monitoring, the email server keeps working. We don't stop it. We don't change a single line of production code. Any effect the probes could have (performance, for instance) will vanish as soon as the monitoring script is stopped.

And yes, you can do this, frequently, stopping the process and instrumentalizing it by hand. But that disturb the service, it is error prone and there are things you simply can't do because your are instrumentalizing your code, not the C libraries neither the entire operating system. For instance, you can't easily correlate events between two different processes. With DTrace, you can. 

Moreover, this kind of investigation is iterative. You make an hypothesis, write an small script to validate, run it and collect some data, and write another script to explore what you found. If you do that modifying the original sourcecode, you must patch and restart the process constantly. And your patch could crash the server, for instance.

DTrace is very useful too when a process has sporadic problems and you can't just stop it because the problem goes away if you do, and you can't easily find the reason.

So, yes, usually you are interested in performance, but that is a consequence of "visibility" into your program and the entire system. The real deal is the "observability".

About DTrace being a niche, current code works in Solaris and derivatives. It is a quite popular OS in servers. Beside, I plan to support FreeBSD and MacOS too. I think this could be easy (although MacOS lacks the stack dump feature, shame on you, Apple :). Moreover, there are interest in using this with Linux systemtap capabilities, via the stap/DTrace compatibility layer. That seems to be doable too.

You could say "OK, come back when you have ALL that support". Previous attempts failed because of this, to try to cover ALL bases. I am quite sure that if we commit current code and enable it in the current Solaris & OpenIndiana buildbots, other people is going to "kill" :-) to get support in FreeBSD and MacOS too, and Fedora folks -for instance- sure would provide systemtap.

About the maintenance burden, I am a core developer and quite committed to Solaris&family technology. I plan to be around for quite a while, actually. But this technology can be deleted COMPLETELY from current code simply deleting the conditional compilation (there is no 'convenient' code that is not conditionally compiled). You could drop the support anytime, since it would affect no python program at all. Python programs don't know anything about dtracing. DTrace is used by external tools.

There is a point about "let's keep this patch as an external patch that interested parties can apply by themselves". Beside considering that this work deserves to be in mainline python, a "patch" doesn't have buildbot testing and most users are stuck with whatever precompiled python their distributions choose to adopt. Even if linux distributions are interested, an external patch doesn't have the same credibility.

I have the python2.7 patch in production in my servers for the last month or so. 99.99% of the time I don't use it, and the performance hit I am paying for having this patch compiled in (it is completely optional via "configure") is about 1.4%. The other 0.01% of the time I really need it. REALLY. So badly that even decide what operating system I am using, or push me to work hard to create a patch useful to others.

So, yes. The code is intrusive. The code deals with a lot of internal machinery (PEP393 support in the ustack helper was quite difficult). It is going to break from time to time, sure. At the same time, I am committed to support it. And even if it is dropped in 3.4, no Python program will be affected.

Last but not least, other interpreted languages already have DTrace support:

* PHP:

 - http://techportal.ibuildings.com/2009/11/24/analysing-php-web-applications-with-dtrace/
 - https://wiki.php.net/rfc/dtrace

* PERL:

 - https://dgl.cx/2011/01/dtrace-and-perl

* RUBY:

 - http://ruby-dtrace.rubyforge.org/

 - https://rubygems.org/gems/ruby-dtrace

* HASKELL:

 - http://hackage.haskell.org/trac/ghc/wiki/DTrace

 - http://justtesting.org/profiling-garbage-collection-in-haskell-with

* JAVA:

 - http://docs.oracle.com/javase/6/docs/technotes/guides/vm/dtrace.html

* JAVASCRIPT:

 - https://wiki.mozilla.org/Performance/Optimizing_JavaScript_with_DTrace

Googling around you can see dtrace probes everywhere, from languages to libraries to, even, the X-Server: http://www.x.org/releases/X11R7.6/doc/xorg-server/Xserver-DTrace.html
History
Date User Action Args
2012-01-21 00:03:44jceasetrecipients: + jcea, loewis, rhettinger, ronaldoussoren, belopolsky, pitrou, wsanchez, movement, techtonik, benjamin.peterson, serverhorror, glyph, laca, twleung, jbaker, robert.kern, sirg3, danchr, dhduvall, dmalcolm, mjw, Garen, neologix, lasizoillo, fche, hazmat, jmcp, scox
2012-01-21 00:03:43jceasetmessageid: <1327104223.53.0.217811935108.issue13405@psf.upfronthosting.co.za>
2012-01-21 00:03:42jcealinkissue13405 messages
2012-01-21 00:03:40jceacreate