Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors on _bsddb creation and dealloc #48135

Closed
vstinner opened this issue Sep 16, 2008 · 19 comments
Closed

errors on _bsddb creation and dealloc #48135

vstinner opened this issue Sep 16, 2008 · 19 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@vstinner
Copy link
Member

BPO 3885
Nosy @jcea, @amauryfa, @ncoghlan, @vstinner
Files
  • _bsddb.patch: Patch fixing described bugs
  • _bsddb_test.patch: Test to reproduce the crash
  • bsddb.patch: Patch for python 2.6.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/jcea'
    closed_at = <Date 2008-09-23.21:29:51.096>
    created_at = <Date 2008-09-16.21:08:59.800>
    labels = ['library', 'type-crash']
    title = 'errors on _bsddb creation and dealloc'
    updated_at = <Date 2008-09-23.22:47:23.361>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2008-09-23.22:47:23.361>
    actor = 'vstinner'
    assignee = 'jcea'
    closed = True
    closed_date = <Date 2008-09-23.21:29:51.096>
    closer = 'jcea'
    components = ['Library (Lib)']
    creation = <Date 2008-09-16.21:08:59.800>
    creator = 'vstinner'
    dependencies = []
    files = ['11505', '11506', '11521']
    hgrepos = []
    issue_num = 3885
    keywords = ['patch', 'needs review']
    message_count = 19.0
    messages = ['73312', '73313', '73319', '73320', '73327', '73328', '73329', '73330', '73334', '73387', '73388', '73395', '73594', '73602', '73628', '73657', '73670', '73672', '73676']
    nosy_count = 4.0
    nosy_names = ['jcea', 'amaury.forgeotdarc', 'ncoghlan', 'vstinner']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue3885'
    versions = ['Python 2.6']

    @vstinner
    Copy link
    Member Author

    I found two differents bugs using Fusil the fuzzer.

    (1) On db_env_create() error in newDBEnvObject(), self->db_env is not
    set and so use of this pointer may crashs.

    (2) DBEnv_dealloc() may raise an exception (DBEnv_close_internal()
    calls makeDBError()) but the garbage collector dislike exceptions!

    Example of (2):
    ----

    $ python
    >>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
    >>> gc.collect(); gc.collect()
    Exception bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal 
    error, no RPC server -- No Berkeley DB RPC server environment') 
    in 'garbage collection' ignored
    Fatal Python error: unexpected exception during garbage collection

    @vstinner vstinner added the stdlib Python modules in the Lib dir label Sep 16, 2008
    @vstinner
    Copy link
    Member Author

    About the bug (1): it also occurs in DBEnv_dealloc() but
    DBEnv_dealloc() is directly called from newDBEnvObject() with
    Py_DECREF(self);. The two bugs can be reproduces with dummy DBenv()
    arguments, eg. "DBEnv(92)".

    Backtrace using gdb:
    -----

    $ gdb ./python
    ...
    >>> import _bsddb; _bsddb.DBenv(92)
    ...
    Program received signal SIGSEGV, Segmentation fault.

    0xb7cc8f5e in DBEnv_close_internal (self=0x83385f0, flags=0)
    at ../Modules/_bsddb.c:3989
    3989 err = self->db_env->close(self->db_env, flags);
    (gdb) print self->db_env
    $1 = (DB_ENV *) 0xfffffffe
    -----

    There db_env value is not set.

    @jcea
    Copy link
    Member

    jcea commented Sep 16, 2008

    Good catch. I've patched my private branch. Following 2.6 patches, if
    anybody can provide a review, since we are in RC status.

    I was wondering if we must protect other objects from raising exceptions
    in GC. For example, DB objects.

    What do you think?.

    (I have broken my left wrist ten days ago. If you can write a testcase
    showing the issue and a patch for it, would be very nice).

    @vstinner
    Copy link
    Member Author

    Here is a test to reproduce the crash. I don't know db_env_create()
    function, so I don't know what are "invalid arguments". So I used
    ~DB_RPCCLIENT. "gc.collect()" is to detect the bug (2).

    About other objects dealloc method...

    DBEnv_dealloc()
      `-> DBEnv_close_internal(): RETURN_IF_ERR()
    DB_dealloc()
      `-> DB_close_internal(): RETURN_IF_ERR()
    DBC_dealloc() 
      `-> DBC_close_internal(): RETURN_IF_ERR()
    DBSequence_dealloc()
      `-> DBSequence_close_internal(): RETURN_IF_ERR()
    DBTxn_dealloc()
      `-> DBTxn_abort_discard_internal(): RETURN_IF_ERR()

    DBLock_dealloc() looks ok.

    @amauryfa
    Copy link
    Member

    I thought first that the problem was during the execution of
    gc.collect(), but its not: when configured --with-pydebug, the exception
    is printed before:

    >>> import gc, _bsddb; env=_bsddb.DBEnv(3); del env
    XXX undetected error
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    bsddb.db.DBNoServerError: (-30992, 'DB_NOSERVER: Fatal error, no RPC
    server -- No Berkeley DB RPC server environment')

    gc.collect() is just a rude way to display this "XXX undetected error".
    (Victor: does Fusil check for this? gc.collect() will not fail if there
    is another exception in-between, or in debug mode)

    Now, to the _bsddb module: in general, the following pattern is wrong:
       dummy = someFunction();
       Py_XDECREF(dummy);
    because it does nothing about an eventual exception set. If the
    exception can be discarded, PyErr_Clear() must be called.

    I think there is an invariant to keep in each function: return NULL if
    and only if an exception is set. Many places in _bsddb do not respect this.

    @vstinner
    Copy link
    Member Author

    gc.collect() is just a rude way to display this "XXX undetected error".
    (Victor: does Fusil check for this? gc.collect() will not fail if there
    is another exception in-between, or in debug mode)

    I stopped to fuzz Python using --pydebug because a critical design error in
    CPython reference counting: http://bugs.python.org/issue3299 (some modules
    use PyObject_DEL() instead of Py_DECREF() whereas PyObject_DEL() creates
    inconsistent objects in the double linked object list). Since nobody cares
    about this issue, I don't use pydebug anymore.

    Now, to the _bsddb module: in general, the following pattern is wrong:
    dummy = someFunction();
    Py_XDECREF(dummy);
    because it does nothing about an eventual exception set. If the
    exception can be discarded, PyErr_Clear() must be called.

    Well, we have to choices: don't raise an error, or clear the exception if an
    exception is raised.

    I think there is an invariant to keep in each function: return NULL if
    and only if an exception is set. Many places in _bsddb do not respect this.

    Where? dealloc() callback prototype is "void tp_dealloc(...)": no result! It's
    not possible to tell Python that an error occured.

    @jcea
    Copy link
    Member

    jcea commented Sep 17, 2008

    So, the right thing to do seems to drop the "check_error" flag and just
    do a "PyErr_Clear()" if necessary in the dealloc code.

    This must be done in all dealloc code: DBEnv, DB, DBSequence and DBCursor.

    Do you agree?.

    @amauryfa
    Copy link
    Member

    Not only in these functions, but anywhere "dummy=" is followed by a
    Py_XDECREF(dummy);
    
    And code like this must also be changed (in DB_open):
        if (makeDBError(err)) {
            PyObject *dummy;
    
            dummy=DB_close_internal(self,0);
            Py_XDECREF(dummy);
            return NULL;
        }
    if DB_close_internal() raises an exception it will replace the original
    error set by makeDBError(). I'm not sure this is desirable.

    @amauryfa
    Copy link
    Member

    I stopped to fuzz Python using --pydebug because a critical design
    error in CPython reference counting

    I won't comment on this, but to get the extra checks you could arrange
    that ceval.c is compiled with CHECKEXC defined. "./configure
    BASECFLAGS=-DCHECKEXC" did the trick for me.

    Where? dealloc() callback prototype is "void tp_dealloc(...)":
    no result! It's not possible to tell Python that an error occured.

    Exactly. You cannot return NULL (or -1 for other kind of functions), so
    no exception may be raised; PyErr_Clear() calls are necessary.

    @jcea
    Copy link
    Member

    jcea commented Sep 18, 2008

    Not only in these functions, but anywhere "dummy=" is followed by a
    Py_XDECREF(dummy);

    Please, clarify this. I don't see your point.

    And code like this must also be changed (in DB_open):
    if (makeDBError(err)) {
    PyObject *dummy;

       dummy=DB_close_internal(self,0);
       Py_XDECREF(dummy);
       return NULL;
    

    }
    if DB_close_internal() raises an exception it will replace the
    original error set by makeDBError(). I'm not sure this is desirable.

    I agree this could be an issue that need to be studied. Please, post a
    new bug report for that (to be addressed in 2.6.1, I think, since this
    issue doesn't crash the interpreter).

    I post a patch for the bugs originally indicated in this bug report, an
    another one (db.verify() regression).

    *Please Review*. I will update python 2.6 svn when the review is done.
    Thanks.

    @amauryfa
    Copy link
    Member

    > Not only in these functions, but anywhere "dummy=" is followed by a
    > Py_XDECREF(dummy);

    Please, clarify this. I don't see your point.

    If dummy is NULL, a pending exception has been set (with PyErr_SetString
    or another similar function). If you simply ignore this, your function
    will return a valid object, but PyErr_Occurred() returns True. This is
    not correct and will fail in subtle ways (like with gc.collect()); in
    debug mode, a message "XXX undetected error" is printed.
    You must either clear the exception, or return NULL from your function.

    @jcea
    Copy link
    Member

    jcea commented Sep 18, 2008

    Very good point, Amaury.

    I would open a new bug to track this (probably for 2.6.1 time). Can you?.

    In the meanwhile, I need review for the inmediate patch, since the crash
    is 100% reproductible.

    @ncoghlan
    Copy link
    Contributor

    Patch comments:

    • the test suite section of the diff appears to have a number of changes
      that are unrelated to this issue
    • the purpose of the new do_not_close flag (i.e. avoiding the crash)
      could use a comment at the point where it is referenced in the code
    • aren't all those dummy=DB_close_internal(*) + Py_XDECREF calls also in
      need of the PyErr_Clear() treatment?
    • globally clearing *any* error unconditionally is typically a bad idea.
      If there are certain 'expected' errors (such as the No Server error
      Victor was triggering or other BSDDB errors) then those can be cleared
      silently, but for other errors something should at least be written to
      stderr to indicate that an error was ignored because it couldn't be
      propagated back to the eval loop (and possibly even for the BSDDB errors).

    @jcea
    Copy link
    Member

    jcea commented Sep 22, 2008

    Nick:

    1. Yes, the code actually patches an unrelated regression too
      (DB.verify() crashes). I added the testcase, since the testsuite didn't
      exercise "DB.verify()" and so the bug was lurking there for months. It
      is solved now, and the testcase is there to protect us in the future.

    I can split the patch, if demanded, nevertheless.

    1. About commenting "do_not_close" use, I agree. Done. Now the code says:

    """
    /*
    ** "do_not_close" is used to dispose all related objects in the
    ** tree, without actually releasing the "root" object.
    ** This is done, for example, because function calls like
    ** "DB.verify()" implicitly close the underlying handle. So
    ** the handle doesn't need to be closed, but related objects
    ** must be cleaned up.
    */
    """

    Similarly in "DBSequence.remove()".

    I'm actually thinking about splitting "*_close_internal()" functions in
    "release related objects"+"object close". I don't like the
    "do_not_close" flag. But that would be a post-2.6.0 change that I'm
    still considering.

    1. About "dummy=DB_close_internal() + Py_XDECREF", the crashes are
      related to raising exceptions *while
      deallocating objects. These are
      solved with this patch.

    Other "dummy=DB_close_internal(*) + Py_XDECREF" uses need be improved,
    but they do not crash the library, and the changes are a bit extensive
    for a post RC code. They are in my TO DO list, but I think they can be
    left out of Python 2.6.0. See my previous post.

    1. I agree with you. The problem is related to cleaning a
      partially/incorrectly initialized object. Ideally the error should be
      raised in the object constructor. Nevertheless in this case the RPC
      initialization involves several functions that must be called in
      sequence: object created as "RPC enabled"+specifying the details of
      remote RPC server. If the handle is disposed before the second step, it
      can be safely deallocated without showing any spurious error. In this
      concrete example, I don't think the user needs to know about the
      "phantom" error.

    If you don't agree, please tell to me. Would be a few "fprintf()" enough?.

    Thanks for your time, Nick.

    @ncoghlan
    Copy link
    Contributor

    All of those explanations sound fair to me - with those questions
    answered, the patch looks good to me.

    @jcea
    Copy link
    Member

    jcea commented Sep 23, 2008

    Thanks. Committed as r66568.

    @jcea jcea closed this as completed Sep 23, 2008
    @jcea jcea added the type-crash A hard crash of the interpreter, possibly with a core dump label Sep 23, 2008
    @vstinner
    Copy link
    Member Author

    Ok, thanks. My initial bug is closed and my fuzzer is unable to find
    new ones ;-) Great job.

    @jcea
    Copy link
    Member

    jcea commented Sep 23, 2008

    Would be nice to use that fuzzer myself. Details, please :).

    @vstinner
    Copy link
    Member Author

    Would be nice to use that fuzzer myself. Details, please :).

    It's not the best place to explain it, so I will try explain shortly:

    • install fusil 1.0: use mandriva/debian packages, or use sources
    • (create fusil user group)
    • run "sudo fusil-python --module=_bsddb"

    The best way is to use the trunk version (without installation)!
    ----
    cd <your fusil home>
    svn co http://python-ptrace.hachoir.org/svn/trunk python-ptrace
    svn co http://fusil.hachoir.org/svn/trunk fusil
    export PYTHONPATH=$PWD/python-ptrace/ptrace:$PWD/fusil:$PYTHONPATH
    sudo <your python interpreter> ./fusil/fuzzer/fusil-python --module=_bsddb
    ----
    (refer to python-ptrace/INSTALL and fusil/INSTALL)

    You have to run the fuzzer as root to be able to run child processes as the
    fusil user. Never run fusil as root or it will kill you(r computer)!

    See also http://fusil.hachoir.org/trac/wiki/Contact

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants