errors on _bsddb creation and dealloc #48135

vstinner · 2008-09-16T21:09:00Z

BPO	3885
Nosy	@jcea, @amauryfa, @ncoghlan, @vstinner
Files	_bsddb.patch: Patch fixing described bugs _bsddb_test.patch: Test to reproduce the crash bsddb.patch: Patch for python 2.6.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/jcea'
closed_at = <Date 2008-09-23.21:29:51.096>
created_at = <Date 2008-09-16.21:08:59.800>
labels = ['library', 'type-crash']
title = 'errors on _bsddb creation and dealloc'
updated_at = <Date 2008-09-23.22:47:23.361>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2008-09-23.22:47:23.361>
actor = 'vstinner'
assignee = 'jcea'
closed = True
closed_date = <Date 2008-09-23.21:29:51.096>
closer = 'jcea'
components = ['Library (Lib)']
creation = <Date 2008-09-16.21:08:59.800>
creator = 'vstinner'
dependencies = []
files = ['11505', '11506', '11521']
hgrepos = []
issue_num = 3885
keywords = ['patch', 'needs review']
message_count = 19.0
messages = ['73312', '73313', '73319', '73320', '73327', '73328', '73329', '73330', '73334', '73387', '73388', '73395', '73594', '73602', '73628', '73657', '73670', '73672', '73676']
nosy_count = 4.0
nosy_names = ['jcea', 'amaury.forgeotdarc', 'ncoghlan', 'vstinner']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue3885'
versions = ['Python 2.6']

vstinner · 2008-09-16T21:08:56Z

I found two differents bugs using Fusil the fuzzer.

(1) On db_env_create() error in newDBEnvObject(), self->db_env is not
set and so use of this pointer may crashs.

(2) DBEnv_dealloc() may raise an exception (DBEnv_close_internal()
calls makeDBError()) but the garbage collector dislike exceptions!

Example of (2):
----

$ python
>>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
>>> gc.collect(); gc.collect()
Exception bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal 
error, no RPC server -- No Berkeley DB RPC server environment') 
in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection

vstinner · 2008-09-16T21:14:22Z

About the bug (1): it also occurs in DBEnv_dealloc() but
DBEnv_dealloc() is directly called from newDBEnvObject() with
Py_DECREF(self);. The two bugs can be reproduces with dummy DBenv()
arguments, eg. "DBEnv(92)".

Backtrace using gdb:
-----

$ gdb ./python
...
>>> import _bsddb; _bsddb.DBenv(92)
...
Program received signal SIGSEGV, Segmentation fault.

0xb7cc8f5e in DBEnv_close_internal (self=0x83385f0, flags=0)
at ../Modules/_bsddb.c:3989
3989 err = self->db_env->close(self->db_env, flags);
(gdb) print self->db_env
$1 = (DB_ENV *) 0xfffffffe
-----

There db_env value is not set.

jcea · 2008-09-16T22:48:22Z

Good catch. I've patched my private branch. Following 2.6 patches, if
anybody can provide a review, since we are in RC status.

I was wondering if we must protect other objects from raising exceptions
in GC. For example, DB objects.

What do you think?.

(I have broken my left wrist ten days ago. If you can write a testcase
showing the issue and a patch for it, would be very nice).

vstinner · 2008-09-17T00:20:57Z

Here is a test to reproduce the crash. I don't know db_env_create()
function, so I don't know what are "invalid arguments". So I used
~DB_RPCCLIENT. "gc.collect()" is to detect the bug (2).

About other objects dealloc method...

DBEnv_dealloc()
  `-> DBEnv_close_internal(): RETURN_IF_ERR()
DB_dealloc()
  `-> DB_close_internal(): RETURN_IF_ERR()
DBC_dealloc() 
  `-> DBC_close_internal(): RETURN_IF_ERR()
DBSequence_dealloc()
  `-> DBSequence_close_internal(): RETURN_IF_ERR()
DBTxn_dealloc()
  `-> DBTxn_abort_discard_internal(): RETURN_IF_ERR()

DBLock_dealloc() looks ok.

amauryfa · 2008-09-17T08:53:28Z

I thought first that the problem was during the execution of
gc.collect(), but its not: when configured --with-pydebug, the exception
is printed before:

>>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
XXX undetected error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal error, no RPC
server -- No Berkeley DB RPC server environment')

gc.collect() is just a rude way to display this "XXX undetected error".
(Victor: does Fusil check for this? gc.collect() will not fail if there
is another exception in-between, or in debug mode)

Now, to the _bsddb module: in general, the following pattern is wrong:
   dummy = someFunction();
   Py_XDECREF(dummy);
because it does nothing about an eventual exception set. If the
exception can be discarded, PyErr_Clear() must be called.

I think there is an invariant to keep in each function: return NULL if
and only if an exception is set. Many places in _bsddb do not respect this.

vstinner · 2008-09-17T09:38:04Z

gc.collect() is just a rude way to display this "XXX undetected error".
(Victor: does Fusil check for this? gc.collect() will not fail if there
is another exception in-between, or in debug mode)

I stopped to fuzz Python using --pydebug because a critical design error in
CPython reference counting: http://bugs.python.org/issue3299 (some modules
use PyObject_DEL() instead of Py_DECREF() whereas PyObject_DEL() creates
inconsistent objects in the double linked object list). Since nobody cares
about this issue, I don't use pydebug anymore.

Now, to the _bsddb module: in general, the following pattern is wrong:
dummy = someFunction();
Py_XDECREF(dummy);
because it does nothing about an eventual exception set. If the
exception can be discarded, PyErr_Clear() must be called.

Well, we have to choices: don't raise an error, or clear the exception if an
exception is raised.

I think there is an invariant to keep in each function: return NULL if
and only if an exception is set. Many places in _bsddb do not respect this.

Where? dealloc() callback prototype is "void tp_dealloc(...)": no result! It's
not possible to tell Python that an error occured.

jcea · 2008-09-17T10:22:07Z

So, the right thing to do seems to drop the "check_error" flag and just
do a "PyErr_Clear()" if necessary in the dealloc code.

This must be done in all dealloc code: DBEnv, DB, DBSequence and DBCursor.

Do you agree?.

amauryfa · 2008-09-17T11:25:05Z

Not only in these functions, but anywhere "dummy=" is followed by a
Py_XDECREF(dummy);

And code like this must also be changed (in DB_open):
    if (makeDBError(err)) {
        PyObject *dummy;

        dummy=DB_close_internal(self,0);
        Py_XDECREF(dummy);
        return NULL;
    }
if DB_close_internal() raises an exception it will replace the original
error set by makeDBError(). I'm not sure this is desirable.

amauryfa · 2008-09-17T13:17:23Z

I stopped to fuzz Python using --pydebug because a critical design
error in CPython reference counting

I won't comment on this, but to get the extra checks you could arrange
that ceval.c is compiled with CHECKEXC defined. "./configure
BASECFLAGS=-DCHECKEXC" did the trick for me.

Where? dealloc() callback prototype is "void tp_dealloc(...)":
no result! It's not possible to tell Python that an error occured.

Exactly. You cannot return NULL (or -1 for other kind of functions), so
no exception may be raised; PyErr_Clear() calls are necessary.

jcea · 2008-09-18T15:20:11Z

Not only in these functions, but anywhere "dummy=" is followed by a
Py_XDECREF(dummy);

Please, clarify this. I don't see your point.

And code like this must also be changed (in DB_open):
if (makeDBError(err)) {
PyObject *dummy;
   dummy=DB_close_internal(self,0);
   Py_XDECREF(dummy);
   return NULL;
}
if DB_close_internal() raises an exception it will replace the
original error set by makeDBError(). I'm not sure this is desirable.

I agree this could be an issue that need to be studied. Please, post a
new bug report for that (to be addressed in 2.6.1, I think, since this
issue doesn't crash the interpreter).

I post a patch for the bugs originally indicated in this bug report, an
another one (db.verify() regression).

*Please Review*. I will update python 2.6 svn when the review is done.
Thanks.

amauryfa · 2008-09-18T16:10:21Z

> Not only in these functions, but anywhere "dummy=" is followed by a
> Py_XDECREF(dummy);

Please, clarify this. I don't see your point.

If dummy is NULL, a pending exception has been set (with PyErr_SetString
or another similar function). If you simply ignore this, your function
will return a valid object, but PyErr_Occurred() returns True. This is
not correct and will fail in subtle ways (like with gc.collect()); in
debug mode, a message "XXX undetected error" is printed.
You must either clear the exception, or return NULL from your function.

jcea · 2008-09-18T18:29:09Z

Very good point, Amaury.

I would open a new bug to track this (probably for 2.6.1 time). Can you?.

In the meanwhile, I need review for the inmediate patch, since the crash
is 100% reproductible.

ncoghlan · 2008-09-22T21:02:03Z

Patch comments:

the test suite section of the diff appears to have a number of changes
that are unrelated to this issue
the purpose of the new do_not_close flag (i.e. avoiding the crash)
could use a comment at the point where it is referenced in the code
aren't all those dummy=DB_close_internal(*) + Py_XDECREF calls also in
need of the PyErr_Clear() treatment?
globally clearing *any* error unconditionally is typically a bad idea.
If there are certain 'expected' errors (such as the No Server error
Victor was triggering or other BSDDB errors) then those can be cleared
silently, but for other errors something should at least be written to
stderr to indicate that an error was ignored because it couldn't be
propagated back to the eval loop (and possibly even for the BSDDB errors).

jcea · 2008-09-22T22:20:56Z

Nick:

Yes, the code actually patches an unrelated regression too
(DB.verify() crashes). I added the testcase, since the testsuite didn't
exercise "DB.verify()" and so the bug was lurking there for months. It
is solved now, and the testcase is there to protect us in the future.

I can split the patch, if demanded, nevertheless.

About commenting "do_not_close" use, I agree. Done. Now the code says:

"""
/*
** "do_not_close" is used to dispose all related objects in the
** tree, without actually releasing the "root" object.
** This is done, for example, because function calls like
** "DB.verify()" implicitly close the underlying handle. So
** the handle doesn't need to be closed, but related objects
** must be cleaned up.
*/
"""

Similarly in "DBSequence.remove()".

I'm actually thinking about splitting "*_close_internal()" functions in
"release related objects"+"object close". I don't like the
"do_not_close" flag. But that would be a post-2.6.0 change that I'm
still considering.

About "dummy=DB_close_internal() + Py_XDECREF", the crashes are
related to raising exceptions *while deallocating objects. These are
solved with this patch.

Other "dummy=DB_close_internal(*) + Py_XDECREF" uses need be improved,
but they do not crash the library, and the changes are a bit extensive
for a post RC code. They are in my TO DO list, but I think they can be
left out of Python 2.6.0. See my previous post.

I agree with you. The problem is related to cleaning a
partially/incorrectly initialized object. Ideally the error should be
raised in the object constructor. Nevertheless in this case the RPC
initialization involves several functions that must be called in
sequence: object created as "RPC enabled"+specifying the details of
remote RPC server. If the handle is disposed before the second step, it
can be safely deallocated without showing any spurious error. In this
concrete example, I don't think the user needs to know about the
"phantom" error.

If you don't agree, please tell to me. Would be a few "fprintf()" enough?.

Thanks for your time, Nick.

ncoghlan · 2008-09-23T12:06:02Z

All of those explanations sound fair to me - with those questions
answered, the patch looks good to me.

jcea · 2008-09-23T19:27:51Z

Thanks. Committed as r66568.

vstinner · 2008-09-23T21:49:05Z

Ok, thanks. My initial bug is closed and my fuzzer is unable to find
new ones ;-) Great job.

jcea · 2008-09-23T22:24:13Z

Would be nice to use that fuzzer myself. Details, please :).

vstinner · 2008-09-23T22:47:23Z

Would be nice to use that fuzzer myself. Details, please :).

It's not the best place to explain it, so I will try explain shortly:

install fusil 1.0: use mandriva/debian packages, or use sources
(create fusil user group)
run "sudo fusil-python --module=_bsddb"

The best way is to use the trunk version (without installation)!
----
cd <your fusil home>
svn co http://python-ptrace.hachoir.org/svn/trunk python-ptrace
svn co http://fusil.hachoir.org/svn/trunk fusil
export PYTHONPATH=$PWD/python-ptrace/ptrace:$PWD/fusil:$PYTHONPATH
sudo <your python interpreter> ./fusil/fuzzer/fusil-python --module=_bsddb
----
(refer to python-ptrace/INSTALL and fusil/INSTALL)

You have to run the fuzzer as root to be able to run child processes as the
fusil user. Never run fusil as root or it will kill you(r computer)!

See also http://fusil.hachoir.org/trac/wiki/Contact

vstinner added the stdlib Python modules in the Lib dir label Sep 16, 2008

benjaminp assigned jcea Sep 16, 2008

jcea closed this as completed Sep 23, 2008

jcea added the type-crash A hard crash of the interpreter, possibly with a core dump label Sep 23, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

vincedani mentioned this issue Feb 20, 2023

AssertionError on collections.deque().count(1) #55213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

errors on _bsddb creation and dealloc #48135

errors on _bsddb creation and dealloc #48135

vstinner commented Sep 16, 2008

vstinner commented Sep 16, 2008

vstinner commented Sep 16, 2008

jcea commented Sep 16, 2008

vstinner commented Sep 17, 2008

amauryfa commented Sep 17, 2008

vstinner commented Sep 17, 2008

jcea commented Sep 17, 2008

amauryfa commented Sep 17, 2008

amauryfa commented Sep 17, 2008

jcea commented Sep 18, 2008

amauryfa commented Sep 18, 2008

jcea commented Sep 18, 2008

ncoghlan commented Sep 22, 2008

jcea commented Sep 22, 2008

ncoghlan commented Sep 23, 2008

jcea commented Sep 23, 2008

vstinner commented Sep 23, 2008

jcea commented Sep 23, 2008

vstinner commented Sep 23, 2008

errors on _bsddb creation and dealloc #48135

errors on _bsddb creation and dealloc #48135

Comments

vstinner commented Sep 16, 2008

vstinner commented Sep 16, 2008

vstinner commented Sep 16, 2008

jcea commented Sep 16, 2008

vstinner commented Sep 17, 2008

amauryfa commented Sep 17, 2008

vstinner commented Sep 17, 2008

jcea commented Sep 17, 2008

amauryfa commented Sep 17, 2008

amauryfa commented Sep 17, 2008

jcea commented Sep 18, 2008

amauryfa commented Sep 18, 2008

jcea commented Sep 18, 2008

ncoghlan commented Sep 22, 2008

jcea commented Sep 22, 2008

ncoghlan commented Sep 23, 2008

jcea commented Sep 23, 2008

vstinner commented Sep 23, 2008

jcea commented Sep 23, 2008

vstinner commented Sep 23, 2008