classification
Title: bsddb databases in python 2.6 are not compatible with python 2.5
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: jcea Nosy List: PeterL, guy.linton, jcea, loewis
Priority: normal Keywords:

Created on 2010-04-23 10:02 by guy.linton, last changed 2010-04-23 20:24 by loewis. This issue is now closed.

Messages (13)
msg103998 - (view) Author: Tim Lyons (guy.linton) Date: 2010-04-23 10:02
A database created under python 2.5 cannot be opened under python 2.6. It gives the error message "DB_RUNRECOVERY: Fatal error, run database recovery -- process-private: unable to find environment ", and a database created under python 2.6 cannot be opened under python 2.5 (see http://trac.macports.org/ticket/24310). (This in in Mac OS X: In Windows XP SP3, Python 2.6 can read a Python 2.5 bsddb data base.
but not the other way around. If you try, you will end up with a corrupt data base.)

python 2.6 bsddb is very much slower than python 2.5. Specifically, in Gramps, import of a 500 person xml file takes 12 sec with python25 and 9 mins 30 secs with python26. The slowness has been observed in Mac OS X (See http://trac.macports.org/ticket/23768) and in Windows (see http://www.gramps-project.org/bugs/view.php?id=3750).

I am not sure, but I think that both systems are using the same underlying database module db46, and that the difference may be in the different interface modules: "_bsddb.so" (on Mac OS X)
msg103999 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-23 10:18
Jesus, any idea?
msg104007 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2010-04-23 12:25
The database compatibility is dictated by the underlying Berkeley DB library used. Reporter, please do this: (asuming you are using "bsddb" lib in the standard lib, not external project "pybsddb")

1. Open a python2.5 shell.

2. "import bsddb"

3. "print bsddb.__version__, bsddb.db.version()"

4. Post the numbers.

5. Repeat under python2.6.

In my machine, I get:

python2.5: 4.4.5.3 (4, 5, 20)
python2.6: 4.7.3 (4, 7, 25)

So under python2.5 I would be using Berkeley DB 4.5, and under python2.6 I am using Berkeley DB 4.7.

Berkeley DB has a defined procedure to upgrade databases. This is specially important if you are using a transactional datastore. BEWARE: There is *NO* downgrade path.

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_upgrade.html

Most of the time, the database format doesn't change from version to version, but the environment do (specially the log format). Each Berkeley DB database release documentation includes a very detailed "upgrading" section. For instance:

http://www.oracle.com/technology/documentation/berkeley-db/db/installation/upgrade_11gr2_toc.html

Anyway the details are the following:

1. A database created with a X Berkeley DB can not be used in a Y version, if Y<X.

2. A database created with a X Berkeley DB can be used in a Y version, if Y>X, if you upgrade the environment/databases first to the new version.

The documented upgrade procedure is:

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/upgrade_process.html


If you try to use an old format with a new library without updating, you should get a CLEAR error message:

"""
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30971, "DB_VERSION_MISMATCH: Database environment version mismatch -- Program version 4.7 doesn't match environment version 4.5")
"""

The error is pretty obvious.

If you go the other way:

"""
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> 
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30972, "DB_VERSION_MISMATCH: Database environment version mismatch -- Program version 4.5 doesn't match environment version 4.7")
"""

So, no database corruption, but a clear error message. I guess Gram is reporting database corruption because it can't open the database for some reason, not that the DB is actually corrupted.

In your case, anyway, you are saying that you are using the same Berkeley DB both in python2.5 and 2.6, so all this explanation is not actually related. Please CONFIRM that.

If you are using actually the same BDB version, the next step is to try to open the DB manually (with a short python script like the posted one).

Remember, ALSO, that if you are using a BDB previous to 4.7, you can not simply copy an environment between different endianess machines. For instance, moving from PowerPC to x86. I think that was solved in BDB 4.7, IIRC. Or maybe 4.8.

Look at http://forums.oracle.com/forums/thread.jspa?messageID=3725053

About the speed, if you are using the same BerkeleyDB release, the speed should be the same. So the first step would be to actually know if you are using the same BDB version.

I guess the importing is doing a new transaction per imported record, flushing them to disk. Flushing is an expensive and slow operation. In a regular HD, that would limit the speed to 30-120 transactions per second, maximum (depending of your filesystem). The dependency of the filesystem could explain the difference between Linux and Windows.

The an approach would be to enclose ALL the imported records in a single transaction. If the imported is huge you can run out of BDB resources, so enclose every 1000 register in a transaction, for instance. Or increase BDB resource pool (shared regions).

Another option (the right approach :) I would do would be to insert each record in its own transaction, but configuring those transactions as "not flushing", to keep them in memory as long as possible. When the last transaction is committed, do a final huge flush/checkpoint.

Berkeley DB is amazing, but mastering it is difficult.

Anyway, confirm you are using the same BDB in python2.5 and 2.6, that you are not migrating from PowerPC to x86 and that you are not flushing transactions wildy (under Linux, use "dtrace" and lookout for "sync", "fsync", "datasync", or other related syscalls).
msg104010 - (view) Author: Peter Landgren (PeterL) Date: 2010-04-23 13:09
I could add what I have found using bsddb in Python 2.5 and 2.6 under Windows XP SP3. In my installation:
Python 2.5.4 bsddb 4.4.5.3
Python 2.6.4 bsddb 4.7.3
What I did: In Gramps imported an XML backup file to a empty bsddb database. It took about 1 hour with 2.6.4 and 2 minutes with 2.5.4!
I have also instelled bsddb3:
Python 2.6.4 bsddb3 4.8.4
and with the same import I'm back to 2 minutes.
I have pstat logs which I could provide.
msg104016 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2010-04-23 15:02
I need to know the Berkeley DB version you are using in python 2.5, 2.6, both with bsddb and pybsddb (bsddb3).

Also, I would need a testcase I can try without installing Gram myself.
msg104019 - (view) Author: Peter Landgren (PeterL) Date: 2010-04-23 15:28
Requested data on my Windows box:
Python 2.5  bsddb 4.4.5.3   4.4.20
Python 2.6  bsddb 4.7.3     4.7.25
Python 2.6  bsddb 4.8.4     4.8.26

OK?
msg104020 - (view) Author: Peter Landgren (PeterL) Date: 2010-04-23 15:31
Maybe I should add that there is no speed degradation between 2.5 and 2.5 when doing the same thing in Linux.
msg104023 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2010-04-23 15:41
Peter, and which Berkeley DB versions are used in Linux?.
msg104025 - (view) Author: Peter Landgren (PeterL) Date: 2010-04-23 16:12
In Linux it is:
4.4.5.3 (4, 6, 21)

You asked for a test case. I'm not sure how I can provide one without you having Gramps installed to test it.
Do you mean the whole database environment?
msg104030 - (view) Author: Tim Lyons (guy.linton) Date: 2010-04-23 17:08
On Mac OS X,I get

tim$ python
Python 2.5.5 (r255:77872, Mar 21 2010, 22:08:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> print bsddb.__version__, bsddb.db.version()
4.4.5.3 (4, 6, 21)
>>>
tim$ /opt/local/bin/python2.6
Python 2.6.5 (r265:79063, Apr  8 2010, 22:42:38)
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> print bsddb.__version__, bsddb.db.version()
4.7.3 (4, 7, 25)

So the database versions are:
python 2.5 bsddb 4.4.5.3 (4, 6, 21)
python 2.6 bsddb 4.7.3 (4, 7, 25)

On python 2.5:
Python 2.5.5 (r255:77872, Mar 21 2010, 22:08:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db1=bsddb.db.DB(dbenv)
>>> db1.open("note.db",flags=bsddb.db.DB_RDONLY,dbtype=bsddb.db.DB_UNKNOWN)
>>> 

and on python 2.6:
Python 2.6.5 (r265:79063, Apr  8 2010, 22:42:38) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30971, "DB_VERSION_MISMATCH: Database environment version mismatch -- Program version 4.7 doesn't match environment version 4.6")
>>> 

The incompatibility between the two environments is therefore resolved as being due to different versions of bsddb. Thanks for all your help in determining this.

The database slowdown still remains to be resolved.
msg104033 - (view) Author: Peter Landgren (PeterL) Date: 2010-04-23 17:47
To make it 100% clear:

The versions are almost the same for Linux and Windows.
           Python 2.5            Python 2.6
Windows  4.4.5.3 (4, 6, 20)    4.7.3 (4.7.25)
Linux    4.4.5.3 (4, 6, 21)    4.7.3 (4.7.25)
msg104043 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-23 20:13
Peter, please stay out of this bug report unless you are certain that you have the very problem that the OP reported, namely that a database created by Python 2.5 cannot be imported in 2.6. I'm taking the performance issues out of this bug report; anybody interested in them should create a separate bug report.

One bug per bug report, please.
msg104046 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-04-23 20:23
I just noticed that Tim reports in msg104030 that the original problem is resolved. So I'm closing this report as fixed.

If you create a new one on the performance issue, please make sure to include a repeatable test case, with instructions on how to repeat it. 

Notice that Jesus suggests that the performance difference may be caused by the difference in bsddb version, in which case it wouldn't be a Python bug at all. I find that theory very plausible. Most likely, the bug would be in Gramps, for using bsddb incorrectly.
History
Date User Action Args
2010-04-23 20:24:06loewissetstatus: open -> closed
resolution: not a bug
2010-04-23 20:23:48loewissetmessages: + msg104046
2010-04-23 20:13:21loewissetmessages: + msg104043
title: bsddb databases in python 2.6 are not compatible with python 2.5 and are slow in python 2.6 -> bsddb databases in python 2.6 are not compatible with python 2.5
2010-04-23 17:47:57PeterLsetmessages: + msg104033
2010-04-23 17:08:56guy.lintonsetmessages: + msg104030
2010-04-23 16:12:41PeterLsetmessages: + msg104025
2010-04-23 15:41:26jceasetmessages: + msg104023
2010-04-23 15:31:17PeterLsetmessages: + msg104020
2010-04-23 15:28:47PeterLsetmessages: + msg104019
2010-04-23 15:02:31jceasetmessages: + msg104016
2010-04-23 13:09:26PeterLsetnosy: + PeterL
messages: + msg104010
2010-04-23 12:26:00jceasetmessages: + msg104007
2010-04-23 10:18:07loewissetassignee: jcea

messages: + msg103999
nosy: + jcea, loewis
2010-04-23 10:02:43guy.lintoncreate