classification
Title: errors on _bsddb creation and dealloc
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jcea Nosy List: amaury.forgeotdarc, haypo, jcea, ncoghlan
Priority: high Keywords: needs review, patch

Created on 2008-09-16 21:08 by haypo, last changed 2008-09-23 22:47 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
_bsddb.patch haypo, 2008-09-16 21:08 Patch fixing described bugs
_bsddb_test.patch haypo, 2008-09-17 00:20 Test to reproduce the crash
bsddb.patch jcea, 2008-09-18 15:20 Patch for python 2.6.
Messages (19)
msg73312 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-16 21:08
I found two differents bugs using Fusil the fuzzer.

(1) On db_env_create() error in newDBEnvObject(), self->db_env is not 
set and so use of this pointer may crashs.

(2) DBEnv_dealloc() may raise an exception (DBEnv_close_internal() 
calls makeDBError()) but the garbage collector dislike exceptions!

Example of (2):
----
$ python
>>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
>>> gc.collect(); gc.collect()
Exception bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal 
error, no RPC server -- No Berkeley DB RPC server environment') 
in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection
----
msg73313 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-16 21:14
About the bug (1): it also occurs in DBEnv_dealloc() but 
DBEnv_dealloc() is directly called from newDBEnvObject() with 
Py_DECREF(self);. The two bugs can be reproduces with dummy DBenv() 
arguments, eg. "DBEnv(92)".

Backtrace using gdb:
-----
$ gdb ./python
...
>>> import _bsddb; _bsddb.DBenv(92)
...
Program received signal SIGSEGV, Segmentation fault.

0xb7cc8f5e in DBEnv_close_internal (self=0x83385f0, flags=0) 
at ../Modules/_bsddb.c:3989
3989            err = self->db_env->close(self->db_env, flags);
(gdb) print self->db_env
$1 = (DB_ENV *) 0xfffffffe
-----

There db_env value is not set.
msg73319 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-16 22:48
Good catch. I've patched my private branch. Following 2.6 patches, if
anybody can provide a review, since we are in RC status.

I was wondering if we must protect other objects from raising exceptions
in GC. For example, DB objects.

What do you think?.

(I have broken my left wrist ten days ago. If you can write a testcase
showing the issue and a patch for it, would be very nice).
msg73320 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-17 00:20
Here is a test to reproduce the crash. I don't know db_env_create() 
function, so I don't know what are "invalid arguments". So I used 
~DB_RPCCLIENT. "gc.collect()" is to detect the bug (2).

About other objects dealloc method...

DBEnv_dealloc()
  `-> DBEnv_close_internal(): RETURN_IF_ERR()
DB_dealloc()
  `-> DB_close_internal(): RETURN_IF_ERR()
DBC_dealloc() 
  `-> DBC_close_internal(): RETURN_IF_ERR()
DBSequence_dealloc()
  `-> DBSequence_close_internal(): RETURN_IF_ERR()
DBTxn_dealloc()
  `-> DBTxn_abort_discard_internal(): RETURN_IF_ERR()

DBLock_dealloc() looks ok.
msg73327 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-17 08:53
I thought first that the problem was during the execution of
gc.collect(), but its not: when configured --with-pydebug, the exception
is printed before:

>>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
XXX undetected error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal error, no RPC
server -- No Berkeley DB RPC server environment')

gc.collect() is just a rude way to display this "XXX undetected error".
(Victor: does Fusil check for this? gc.collect() will not fail if there
is another exception in-between, or in debug mode)

Now, to the _bsddb module: in general, the following pattern is wrong:
   dummy = someFunction();
   Py_XDECREF(dummy);
because it does nothing about an eventual exception set. If the
exception can be discarded, PyErr_Clear() must be called.

I think there is an invariant to keep in each function: return NULL if
and only if an exception is set. Many places in _bsddb do not respect this.
msg73328 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-17 09:38
> gc.collect() is just a rude way to display this "XXX undetected error".
> (Victor: does Fusil check for this? gc.collect() will not fail if there
> is another exception in-between, or in debug mode)

I stopped to fuzz Python using --pydebug because a critical design error in 
CPython reference counting: http://bugs.python.org/issue3299 (some modules 
use PyObject_DEL() instead of Py_DECREF() whereas PyObject_DEL() creates 
inconsistent objects in the double linked object list). Since nobody cares 
about this issue, I don't use pydebug anymore.

> Now, to the _bsddb module: in general, the following pattern is wrong:
>    dummy = someFunction();
>    Py_XDECREF(dummy);
> because it does nothing about an eventual exception set. If the
> exception can be discarded, PyErr_Clear() must be called.

Well, we have to choices: don't raise an error, or clear the exception if an 
exception is raised.

> I think there is an invariant to keep in each function: return NULL if
> and only if an exception is set. Many places in _bsddb do not respect this.

Where? dealloc() callback prototype is "void tp_dealloc(...)": no result! It's 
not possible to tell Python that an error occured.
msg73329 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-17 10:22
So, the right thing to do seems to drop the "check_error" flag and just
do a "PyErr_Clear()" if necessary in the dealloc code.

This must be done in all dealloc code: DBEnv, DB, DBSequence and DBCursor.

Do you agree?.
msg73330 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-17 11:25
Not only in these functions, but anywhere "dummy=" is followed by a
Py_XDECREF(dummy);

And code like this must also be changed (in DB_open):
    if (makeDBError(err)) {
        PyObject *dummy;

        dummy=DB_close_internal(self,0);
        Py_XDECREF(dummy);
        return NULL;
    }
if DB_close_internal() raises an exception it will replace the original
error set by makeDBError(). I'm not sure this is desirable.
msg73334 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-17 13:17
> I stopped to fuzz Python using --pydebug because a critical design
> error in CPython reference counting

I won't comment on this, but to get the extra checks you could arrange
that ceval.c is compiled with CHECKEXC defined. "./configure
BASECFLAGS=-DCHECKEXC" did the trick for me.

> Where? dealloc() callback prototype is "void tp_dealloc(...)": 
> no result! It's not possible to tell Python that an error occured.

Exactly. You cannot return NULL (or -1 for other kind of functions), so
no exception may be raised; PyErr_Clear() calls are necessary.
msg73387 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-18 15:20
> Not only in these functions, but anywhere "dummy=" is followed by a
> Py_XDECREF(dummy);

Please, clarify this. I don't see your point.

> And code like this must also be changed (in DB_open):
>    if (makeDBError(err)) {
>        PyObject *dummy;
>
>        dummy=DB_close_internal(self,0);
>        Py_XDECREF(dummy);
>        return NULL;
>    }
> if DB_close_internal() raises an exception it will replace the
> original error set by makeDBError(). I'm not sure this is desirable.

I agree this could be an issue that need to be studied. Please, post a
new bug report for that (to be addressed in 2.6.1, I think, since this
issue doesn't crash the interpreter).

I post a patch for the bugs originally indicated in this bug report, an
another one (db.verify() regression).

*Please Review*. I will update python 2.6 svn when the review is done.
Thanks.
msg73388 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-18 16:10
> > Not only in these functions, but anywhere "dummy=" is followed by a
> > Py_XDECREF(dummy);
> 
> Please, clarify this. I don't see your point.

If dummy is NULL, a pending exception has been set (with PyErr_SetString
or another similar function). If you simply ignore this, your function
will return a valid object, but PyErr_Occurred() returns True. This is
not correct and will fail in subtle ways (like with gc.collect()); in
debug mode, a message "XXX undetected error" is printed.
You must either clear the exception, or return NULL from your function.
msg73395 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-18 18:29
Very good point, Amaury.

I would open a new bug to track this (probably for 2.6.1 time). Can you?.

In the meanwhile, I need review for the inmediate patch, since the crash
is 100% reproductible.
msg73594 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-09-22 21:02
Patch comments:
- the test suite section of the diff appears to have a number of changes
that are unrelated to this issue
- the purpose of the new do_not_close flag (i.e. avoiding the crash)
could use a comment at the point where it is referenced in the code
- aren't all those dummy=DB_close_internal(*) + Py_XDECREF calls also in
need of the PyErr_Clear() treatment?
- globally clearing *any* error unconditionally is typically a bad idea.
If there are certain 'expected' errors (such as the No Server error
Victor was triggering or other BSDDB errors) then those can be cleared
silently, but for other errors something should at least be written to
stderr to indicate that an error was ignored because it couldn't be
propagated back to the eval loop (and possibly even for the BSDDB errors).
msg73602 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-22 22:20
Nick:

1. Yes, the code actually patches an unrelated regression too
(DB.verify() crashes). I added the testcase, since the testsuite didn't
exercise "DB.verify()" and so the bug was lurking there for months. It
is solved now, and the testcase is there to protect us in the future.

I can split the patch, if demanded, nevertheless.

2. About commenting "do_not_close" use, I agree. Done. Now the code says:

"""
        /*
        ** "do_not_close" is used to dispose all related objects in the
        ** tree, without actually releasing the "root" object.
        ** This is done, for example, because function calls like
        ** "DB.verify()" implicitly close the underlying handle. So
        ** the handle doesn't need to be closed, but related objects
        ** must be cleaned up.
        */
"""

Similarly in "DBSequence.remove()".

I'm actually thinking about splitting "*_close_internal()" functions in
"release related objects"+"object close". I don't like the
"do_not_close" flag. But that would be a post-2.6.0 change that I'm
still considering.

3. About "dummy=DB_close_internal(*) + Py_XDECREF", the crashes are
related to raising exceptions *while* deallocating objects. These are
solved with this patch.

Other "dummy=DB_close_internal(*) + Py_XDECREF" uses need be improved,
but they do not crash the library, and the changes are a bit extensive
for a post RC code. They are in my TO DO list, but I think they can be
left out of Python 2.6.0. See my previous post.

4. I agree with you. The problem is related to cleaning a
partially/incorrectly initialized object. Ideally the error should be
raised in the object constructor. Nevertheless in this case the RPC
initialization involves several functions that must be called in
sequence: object created as "RPC enabled"+specifying the details of
remote RPC server. If the handle is disposed before the second step, it
can be safely deallocated without showing any spurious error. In this
concrete example, I don't think the user needs to know about the
"phantom" error.

If you don't agree, please tell to me. Would be a few "fprintf()" enough?.

Thanks for your time, Nick.
msg73628 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-09-23 12:06
All of those explanations sound fair to me - with those questions
answered, the patch looks good to me.
msg73657 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-23 19:27
Thanks. Committed as r66568.
msg73670 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-23 21:49
Ok, thanks. My initial bug is closed and my fuzzer is unable to find 
new ones ;-) Great job.
msg73672 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-09-23 22:24
Would be nice to use that fuzzer myself. Details, please :).
msg73676 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-23 22:47
> Would be nice to use that fuzzer myself. Details, please :).

It's not the best place to explain it, so I will try explain shortly:
* install fusil 1.0: use mandriva/debian packages, or use sources
* (create fusil user group)
* run "sudo fusil-python --module=_bsddb"

The best way is to use the trunk version (without installation)!
----
cd <your fusil home>
svn co http://python-ptrace.hachoir.org/svn/trunk python-ptrace
svn co http://fusil.hachoir.org/svn/trunk fusil
export PYTHONPATH=$PWD/python-ptrace/ptrace:$PWD/fusil:$PYTHONPATH
sudo <your python interpreter> ./fusil/fuzzer/fusil-python --module=_bsddb
----
(refer to python-ptrace/INSTALL and fusil/INSTALL)

You have to run the fuzzer as root to be able to run child processes as the 
fusil user. Never run fusil as root or it will kill you(r computer)!

See also http://fusil.hachoir.org/trac/wiki/Contact
History
Date User Action Args
2008-09-23 22:47:23hayposetmessages: + msg73676
2008-09-23 22:24:13jceasetmessages: + msg73672
2008-09-23 21:49:05hayposetmessages: + msg73670
2008-09-23 21:29:51jceasetstatus: open -> closed
type: crash
resolution: fixed
2008-09-23 19:27:51jceasetmessages: + msg73657
2008-09-23 12:06:02ncoghlansetmessages: + msg73628
2008-09-22 22:20:57jceasetmessages: + msg73602
2008-09-22 21:02:04ncoghlansetnosy: + ncoghlan
messages: + msg73594
2008-09-18 18:29:09jceasetmessages: + msg73395
2008-09-18 16:10:21amaury.forgeotdarcsetmessages: + msg73388
2008-09-18 15:20:13jceasetfiles: + bsddb.patch
messages: + msg73387
2008-09-17 13:17:23amaury.forgeotdarcsetmessages: + msg73334
2008-09-17 11:25:05amaury.forgeotdarcsetmessages: + msg73330
2008-09-17 10:22:08jceasetmessages: + msg73329
2008-09-17 09:38:04hayposetmessages: + msg73328
2008-09-17 08:53:29amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg73327
2008-09-17 00:20:58hayposetfiles: + _bsddb_test.patch
messages: + msg73320
2008-09-16 22:48:22jceasetpriority: high
keywords: + needs review
messages: + msg73319
2008-09-16 21:14:21hayposetmessages: + msg73313
2008-09-16 21:13:00benjamin.petersonsetassignee: jcea
nosy: + jcea
2008-09-16 21:08:59haypocreate