This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jcea
Recipients guy.linton, jcea, loewis
Date 2010-04-23.12:25:55
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1272025562.52.0.243819029256.issue8504@psf.upfronthosting.co.za>
In-reply-to
Content
The database compatibility is dictated by the underlying Berkeley DB library used. Reporter, please do this: (asuming you are using "bsddb" lib in the standard lib, not external project "pybsddb")

1. Open a python2.5 shell.

2. "import bsddb"

3. "print bsddb.__version__, bsddb.db.version()"

4. Post the numbers.

5. Repeat under python2.6.

In my machine, I get:

python2.5: 4.4.5.3 (4, 5, 20)
python2.6: 4.7.3 (4, 7, 25)

So under python2.5 I would be using Berkeley DB 4.5, and under python2.6 I am using Berkeley DB 4.7.

Berkeley DB has a defined procedure to upgrade databases. This is specially important if you are using a transactional datastore. BEWARE: There is *NO* downgrade path.

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_upgrade.html

Most of the time, the database format doesn't change from version to version, but the environment do (specially the log format). Each Berkeley DB database release documentation includes a very detailed "upgrading" section. For instance:

http://www.oracle.com/technology/documentation/berkeley-db/db/installation/upgrade_11gr2_toc.html

Anyway the details are the following:

1. A database created with a X Berkeley DB can not be used in a Y version, if Y<X.

2. A database created with a X Berkeley DB can be used in a Y version, if Y>X, if you upgrade the environment/databases first to the new version.

The documented upgrade procedure is:

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/upgrade_process.html


If you try to use an old format with a new library without updating, you should get a CLEAR error message:

"""
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30971, "DB_VERSION_MISMATCH: Database environment version mismatch -- Program version 4.7 doesn't match environment version 4.5")
"""

The error is pretty obvious.

If you go the other way:

"""
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> 
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30972, "DB_VERSION_MISMATCH: Database environment version mismatch -- Program version 4.5 doesn't match environment version 4.7")
"""

So, no database corruption, but a clear error message. I guess Gram is reporting database corruption because it can't open the database for some reason, not that the DB is actually corrupted.

In your case, anyway, you are saying that you are using the same Berkeley DB both in python2.5 and 2.6, so all this explanation is not actually related. Please CONFIRM that.

If you are using actually the same BDB version, the next step is to try to open the DB manually (with a short python script like the posted one).

Remember, ALSO, that if you are using a BDB previous to 4.7, you can not simply copy an environment between different endianess machines. For instance, moving from PowerPC to x86. I think that was solved in BDB 4.7, IIRC. Or maybe 4.8.

Look at http://forums.oracle.com/forums/thread.jspa?messageID=3725053

About the speed, if you are using the same BerkeleyDB release, the speed should be the same. So the first step would be to actually know if you are using the same BDB version.

I guess the importing is doing a new transaction per imported record, flushing them to disk. Flushing is an expensive and slow operation. In a regular HD, that would limit the speed to 30-120 transactions per second, maximum (depending of your filesystem). The dependency of the filesystem could explain the difference between Linux and Windows.

The an approach would be to enclose ALL the imported records in a single transaction. If the imported is huge you can run out of BDB resources, so enclose every 1000 register in a transaction, for instance. Or increase BDB resource pool (shared regions).

Another option (the right approach :) I would do would be to insert each record in its own transaction, but configuring those transactions as "not flushing", to keep them in memory as long as possible. When the last transaction is committed, do a final huge flush/checkpoint.

Berkeley DB is amazing, but mastering it is difficult.

Anyway, confirm you are using the same BDB in python2.5 and 2.6, that you are not migrating from PowerPC to x86 and that you are not flushing transactions wildy (under Linux, use "dtrace" and lookout for "sync", "fsync", "datasync", or other related syscalls).
History
Date User Action Args
2010-04-23 12:26:02jceasetrecipients: + jcea, loewis, guy.linton
2010-04-23 12:26:02jceasetmessageid: <1272025562.52.0.243819029256.issue8504@psf.upfronthosting.co.za>
2010-04-23 12:26:00jcealinkissue8504 messages
2010-04-23 12:25:55jceacreate