Issue 1413192: bsddb: segfault on db.associate call with Txn and large data

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42829

classification

Title:	bsddb: segfault on db.associate call with Txn and large data
Type:		Stage:
Components:	Extension Modules	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	nnorwitz	Nosy List:	gregory.p.smith, jcea, nnorwitz, rshura
Priority:	normal	Keywords:

Created on 2006-01-23 20:35 by rshura, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test3.py	rshura, 2006-01-23 20:41	Txn-less code
test1413192.py	nnorwitz, 2006-01-24 06:45
1413192.patch	nnorwitz, 2006-01-24 07:03	fix attempt 1

Messages (16)
msg27331 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-23 20:35
Problem confirmed on Python2.3.5/bsddb4.2.0.5 and Python2.4.2/bsddb4.3.0 on Debian sid and Ubuntu Breezy. It appears, that the associate call, necessary to create a secondary index, segfaults when: 1. There is a large amount of data 2. Environment is transactional. The http://www.gramps-project.org/files/bsddb/testcase.tar.gz contains the example code and two databases, pm.db and pm_ok.db -- both have the same number of keys and each data item is a pickled tuple with two elements. The second index is created over the unpickled data[1]. The pm.db segfaults and the pm_ok.db does not. The second db has much smaller data items in data[0]. If the environment is set up and opened without TXN then pm.db is also fine. Seems like a problem in associate call in a TXN environment, that is only seen with large enough data. Please let me know if I can be of further assistance. This is a show-stopper issue for me, I would go out of my way to help resolving this or finding a work-around. Thanks! Alex P.S. I could not attach the large file, probably due to the size limit on the upload, hence a link to the testcase.
msg27332 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-23 20:41
Logged In: YES user_id=498357 Attaching test3.py containing same code without transactions. Works fine with either pm.db or pm_ok.db
msg27333 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-24 06:45
Logged In: YES user_id=33168 I've got a much simpler test case. The problem seems to be triggered when the txn is deleted after the env (in Modules/_bsddb.c 917 vs 966). If I change the variable names in python, I don't get the same behaviour (ie, it doesn't crash). I removed the original data file, but if you change the_txn to txn, that might "fix" the problem. If not, try playing with different variable names and see if you can get it to not crash. Obviously there needs to be a real fix in C code, but I'm not sure what needs to happen. It doesn't look like we keep enough info to do this properly.
msg27334 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-24 07:03
Logged In: YES user_id=33168 I spoke too soon. The attached patch works for me or your original test case and my pared down version. It also passes the tests. It also fixes a potential memory leak. Let me know if this works for you.
msg27335 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-24 18:50
Logged In: YES user_id=498357 Thanks for a quick response! OK, first thing first: your simpler testcase seems to expose yet another problem, not the one I had. In particular, your testcase segfaults for me on python2.4.2/bsddb4.3.0 but does not segfault with python2.3.5/bsddb4.2.0.5 In my testcase, I can definitely blame the segfault on the associate call, not on open. I can demonstrate it by either commenting out the associate call (no segfault) or by inserting a print statement right before the associate. So your testcase does not seem to have an exact same problem than my testcase. In my testcase nothing seems to depend on variable names (as one would expect). I am rebuilding python2.4 with your patch, will post results soon.
msg27336 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-24 19:31
Logged In: YES user_id=498357 OK, built and installed all kinds of python packages with the patch. All tests are fine. Here goes: 1. Your testcase goes just fine, no segfault with the patched version. 2. Mine still segfaults. 3. I ran mine under gdb with python2.4-dbg package, here's the output (printed "shurafine" is my addition, to make sure that the correct code is being run): $ gdb python2.4-dbg GNU gdb 6.4-debian Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". (gdb) run test2.py Starting program: /usr/bin/python2.4-dbg test2.py [Thread debugging using libthread_db enabled] [New Thread -1210038592 (LWP 29629)] shurafine Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1210038592 (LWP 29629)] 0xb7b57f3e in DB_associate (self=0xb7db9f58, args=0xb7dbd3b4, kwargs=0xb7db5e94) at /home/shura/src/python2.4-2.4.2/Modules/_bsddb.c:1219 1219 Py_DECREF(self->associateCallback); (gdb) Please let me know if I can be of further assistance.
msg27337 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-24 19:37
Logged In: YES user_id=498357 Done same tests on another Debian sid machine, exact same results (up to one line number, due to my extra fprintf statement): (gdb) run test2.py Starting program: /usr/bin/python2.4-dbg test2.py [Thread debugging using libthread_db enabled] [New Thread -1210390848 (LWP 5865)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1210390848 (LWP 5865)] 0xb7b01eb4 in DB_associate (self=0xb7d63df0, args=0xb7d67234, kwargs=0xb7d5ee94) at /home/shura/src/python2.4-2.4.2/Modules/_bsddb.c:1218 1218 Py_DECREF(self->associateCallback); (gdb)
msg27338 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-24 19:40
Logged In: YES user_id=33168 Could you pull the version of Modules/_bsddb.c out of SVN and then apply my patch? I believe your new problem was fixed recently. You can look in the Misc/NEWS file to find the exact patch.
msg27339 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2006-01-24 20:14
Logged In: YES user_id=413 fwiw your patch looks good. it makes sense for a DBTxn to hold a reference to its DBEnv. (I suspect there may still be problems if someone calls DBEnv.close while there are outstanding DBTxn's but doing something about that would be a lot more work if its an actual issue)
msg27340 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-24 20:50
Logged In: YES user_id=498357 With the SVN version of _bsddb.c I no longer have segfault with my test. Instead I have the following exception: Traceback (most recent call last): File "test2.py", line 37, in ? person_map.associate(surnames,find_surname,db.DB_CREATE,txn=the_txn) MemoryError: (12, 'Cannot allocate memory -- Lock table is out of available locks') Now, please bear with me here if you can. It's easy to shrug it off saying that I simply don't have enough locks for this huge txn. But the exact same code works fine with the pm_ok.db file from my testcase, and that file has exact same number of elements and exact same structure of both the data and the secondary index computation. So one would think that it needs exact same number of locks, and yet it works while pm.db does not. The only difference between the two data files is that in each data item, data[0] is much larger in pm.db and smaller in pm_ok.db Is it remotely possible that the actual error has nothing to do with locks but rather with the data size? What can I do to find out or fix this? Thanks for you help!
msg27341 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-24 21:12
Logged In: YES user_id=498357 Tried increasing locks, lockers, and locked objects to 10000 each and seems to help. So I guess the number of locks is data-size specific. I guess this is indeed a lock issue, so it's my problem now and not yours :-) Thanks for your help!
msg27342 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2006-01-25 00:35
Logged In: YES user_id=413 BerkeleyDB uses page locking so it makes sense that a database with larger data objects in it would require more locks assuming it is internally locking each page. That kind of tuning gets into BerkeleyDB internals where i suspect people on the comp.databases.berkeleydb newsgroup could answer things better. glad its working for you now.
msg27343 - (view)	Author: Alex Roitman (rshura)	Date: 2006-01-25 02:21
Logged In: YES user_id=498357 While you guys are here, can I ask you if there's a way to return to the checkpoint made in a Txn-aware database? Specifically, is there a way to return to the latest checkpoing from within python? My problem is that if my data import fails in the middle, I want to undo some transactions that were committed, to have a clean import undo. Checkpoint seems like a nice way to do that, if only I could get back to it :-)
msg27344 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-25 05:23
Logged In: YES user_id=33168 I'm sorry I'm not a Berkeley DB developer, I just play one on TV. :-) Seriously, I don't know anything about BDB. I was just trying to get it stable. Maybe Greg can answer your question.
msg27345 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-25 05:29
Logged In: YES user_id=33168 Committed revision 42177. Committed revision 42178. (2.4)
msg27346 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-01-25 05:31
Logged In: YES user_id=33168 Oh, I forgot to say thanks for the good bug report and responding back.

History
Date	User	Action	Args
2022-04-11 14:56:15	admin	set	github: 42829
2008-03-26 18:16:18	jcea	set	nosy: + jcea
2006-01-23 20:35:45	rshura	create