Fixed Py_DECREF and Py_CLEAR as well. 

Added tests for Py_INCREF and Py_XINCREF (if somebody has a better idea how to tests that INCREF doesn't leak - please, let me know).

Removed comment that Py_DECREF evaluate it's argument multiple times as not relevant anymore.

About considerations from performance point of view - I've made toy example (only this defines and main function) to test how gcc optimizer behaves in different cases - from what I see, if expression is like this (which is majority of cases in the code):
 PyObject* obj = Foo();

assembly code that will be produced (with -O3) is the same before and after patch.
