classification
Title: Reference-counting problem in sqlite
Type: crash Stage: resolved
Components: Extension Modules Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, berker.peksag, gescheit, ghaering, miss-islington
Priority: normal Keywords: patch

Created on 2019-06-20 11:14 by gescheit, last changed 2019-11-28 15:42 by sir-sigurd. This issue is now closed.

Files
File name Uploaded Description Edit
bug.py gescheit, 2019-06-20 11:17
Pull Requests
URL Status Linked Edit
PR 14268 merged gescheit, 2019-06-20 14:07
PR 14725 merged miss-islington, 2019-07-13 03:16
PR 17413 open sir-sigurd, 2019-11-28 15:42
Messages (9)
msg346114 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2019-06-20 11:14
There are a couple of bugs in sqlite bindings have been found related to reference-counting.
Short info:
sqlite connection class uses a dict to keep references to callbacks. This mechanism is not suitable for objects which equals but not the same. 
con = sqlite3.connect()
con.set_trace_callback(logger.debug) 
con.set_trace_callback(logger.debug)  # logger.debug == logger.debug is True, but logger.debug is logger.debug is False because logger.debug bound method is creating in every call
leads to segmentation fault during calling of trace_callback.
My patch fixes this behavior by using a dedicated variable for keeping references to each callback and using dict indexed by function name in case of named callbacks(e.g. create_function()).
Also, due to keeping objects in a variable or in a dict value, it is possible to use unhashable objects as callbacks. e.g. issue7478

Long version:
Sqlite under the hood use dict(called function_pinboard) to keep references to callbacks like progress_handler. 
It needs to decref callbacks objects after closing sqlite connection. 
This mechanism works tolerably(see bug with leaks) with functions but if you try to use bounded methods it causes a segmentation fault.
Let see how it works.

static PyObject *
Custom_set_callback(CustomObject *self, PyObject* args)
{
	PyObject* display_str;
	display_str = PyUnicode_FromFormat("set_callback called with cb=%R id=%i ob_refcnt=%i\n", args, args, args->ob_refcnt);
	PyObject_Print(display_str, stdout, Py_PRINT_RAW);
	if (PyDict_SetItem(self->function_pinboard, args, Py_None) == -1) return NULL;
	//sqlite3_trace(self->db, _trace_callback, trace_callback);
	self->callback_handler = args;
	display_str = PyUnicode_FromFormat("set_callback done for cb=%R id=%i ob_refcnt=%i\n", args, args, args->ob_refcnt);
	PyObject_Print(display_str, stdout, Py_PRINT_RAW);
	Py_RETURN_NONE;
}
static PyObject *
Custom_call_callback(CustomObject *self)
{
	PyObject* display_str;
	display_str = PyUnicode_FromFormat("call with id=%i ob_refcnt=%i\n", self->callback_handler ,
    self->callback_handler->ob_refcnt);
	PyObject_Print(display_str, stdout, Py_PRINT_RAW);
	Py_RETURN_NONE;
}

Python code:
>>>> class TEST:
        def log(self, msg=""):
            pass
>>>> t = TEST()
>>>> conn = Custom()
>>>> conn.set_trace_callback(t.log)
set_callback called with cb=<bound method TEST.log of <__main__.TEST object at 0x10bc60128>> id=196094408 ob_refcnt=1
set_callback done for cb=<bound method TEST.log of <__main__.TEST object at 0x10bc60128>> id=196094408 ob_refcnt=2
>>>> conn.set_trace_callback(t.log)
set_callback called with cb=<bound method TEST.log of <__main__.TEST object at 0x10bc60128>> id=196095112 ob_refcnt=1
set_callback done for cb=<bound method TEST.log of <__main__.TEST object at 0x10bc60128>> id=196095112 ob_refcnt=1
conn.call()
call with id=196095112 ob_refcnt=0

After second conn.set_trace_callback(t.log) call, object t.log reference-count is not increased because 't.log in self->function_pinboard' returns True thus self->function_pinboard[t.log] is not replaced and t.log is not increfed, but it replaces old object in self->callback_handler.
In the end, self->callback_handler keeps a pointer to t.log with ob_refcnt = 0.

Also, there is no cleaning of self->function_pinboard. This leads to leaks every object passed as callback(see test_leak() in bug.py).
msg346180 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-06-21 03:12
Why not use sqlite3_create_function_v2 to register a destructor for the function that does Py_DECREF and dispense with the dictionaries entirely?
msg346196 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2019-06-21 09:09
Because destructor can be registered only for particular functions. For example sqlite3_progress_handler() can't register destructor.
msg346285 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-06-22 17:07
But there can be only one progress handler per connection, so you can just keep an single reference in the struct yourself.
msg346374 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2019-06-24 11:16
At my point of view, dererencing from sqlite destructor has next flaws:
- relying on external library which can has bugs
- using different routine for reference-counting(complicates code)
- loosing support for old sqlite version

Of cause, if it's necessary to use sqlite3_create_function_v2, I'm ready to make appropriate changes.
msg346433 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-06-24 18:38
The _sqlite3 module clearly is not free of bugs in this area. Binding the Python function to the exact lifetime of the sqlite function seems less likely to be buggy than replicating sqlite's function reference management independently. sqlite3_create_function_v2 was added in 2010; I think we can depend on it. Also, won't the callback pretty much be Py_DECREF(something)?
msg346555 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2019-06-25 20:18
I've pushed changes https://github.com/python/cpython/pull/14268/commits/bce7fdc952b14c23821e184e82b3944f6e10aaa9. 
Could I ask for clarification on callback and Py_DECREF(something)?
msg347769 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2019-07-13 03:16
New changeset b9a0376b0dedf16a2f82fa43d851119d1f7a2707 by Benjamin Peterson (gescheit) in branch 'master':
closes bpo-37347: Fix refcount problem in sqlite3. (GH-14268)
https://github.com/python/cpython/commit/b9a0376b0dedf16a2f82fa43d851119d1f7a2707
msg347770 - (view) Author: miss-islington (miss-islington) Date: 2019-07-13 03:33
New changeset 36101c2c5daf692d1716e17720c6c5197f28e25d by Miss Islington (bot) in branch '3.8':
closes bpo-37347: Fix refcount problem in sqlite3. (GH-14268)
https://github.com/python/cpython/commit/36101c2c5daf692d1716e17720c6c5197f28e25d
History
Date User Action Args
2019-11-28 15:42:19sir-sigurdsetpull_requests: + pull_request16894
2019-07-13 03:33:57miss-islingtonsetnosy: + miss-islington
messages: + msg347770
2019-07-13 03:16:12benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg347769

stage: patch review -> resolved
2019-07-13 03:16:06miss-islingtonsetpull_requests: + pull_request14521
2019-06-25 20:18:26gescheitsetmessages: + msg346555
2019-06-24 18:38:50benjamin.petersonsetmessages: + msg346433
2019-06-24 11:16:12gescheitsetmessages: + msg346374
2019-06-22 17:07:14benjamin.petersonsetmessages: + msg346285
2019-06-21 09:09:09gescheitsetmessages: + msg346196
2019-06-21 03:12:27benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg346180
2019-06-20 17:18:47xtreaksetnosy: + berker.peksag
2019-06-20 16:05:16gescheitsetpull_requests: - pull_request14092
2019-06-20 14:07:26gescheitsetpull_requests: + pull_request14094
2019-06-20 13:54:58gescheitsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14092
2019-06-20 11:17:49gescheitsetfiles: + bug.py
2019-06-20 11:14:59gescheitcreate