classification
Title: Fork + shelve causes shelve corruption and backtrace
Type: behavior Stage: test needed
Components: Extension Modules Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, belopolsky, calmofthestorm, jcea, r.david.murray
Priority: normal Keywords:

Created on 2008-12-16 23:42 by calmofthestorm, last changed 2010-07-05 19:56 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
convert.py calmofthestorm, 2008-12-16 23:42 Non-reduced failure case
Messages (7)
msg77942 - (view) Author: Alex Roper (calmofthestorm) Date: 2008-12-16 23:42
Hi,

I wrote a simple script (attached) to do some preprocessing of MediaWiki
XML dumps. When it has a 8 MB chunk ready to dump to disk, it forks, and
the child writes it out and (will) compress it, then exit. The main
thread continues as before. Note that the child thread never touches (or
executes code that has in scope) the shelve handle.

The attached script, as written, will work fine on dumps (I tested it on
enwikisource-20081112-pages-articles.xml available from
http://download.wikimedia.org/enwikisource/20081112/). If you uncomment
the fork on line 40 (and the exit() on line 46 of course) and run it, it
will die after writing out about 450 megabytes with the backtrace below.

This appears to happen deterministically at the same place 3 of the 3
times I ran it. Apologies for the size and complexity of the test, I
don't have time to reduce it further at the moment, and it looks like it
may be fairly involved. I can try to work out a reduced case later and
resubmit if no one wants to touch this as is;)

# I ran the script with:
bzcat enwikisource-20081112-pages-articles.xml.bz2 | ./convert.py
wikisource 8388608
# (after making a dir called wikisource)

Let me know if I can be of any assistance, and apologies if this is
somewhere documented and I missed it.

Using Python 2.6.1 as released from python.org.

Alex

alexr@autumn:~/projects/wikipedia$ cat
enwikisource-20081112-pages-articles.xml | ./convert.py wikisource 8388608
Alexandria version 1, Copyright (C) 2008 Alex Roper
Alexandria comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to copy modify, and
redistribute it
under certain conditions; see the file COPYING for details.
..........................................................Traceback
(most recent call last):
  File "./convert.py", line 100, in <module>
    sax.parse(sys.stdin, Parser(sys.argv[1], MIN_CHUNK_SIZE))
  File "/usr/lib/python2.6/xml/sax/__init__.py", line 33, in parse
    parser.parse(source)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.6/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 304, in end_element
    self._cont_handler.endElement(name)
  File "./convert.py", line 61, in endElement
    s.pagehandler(s.title, s.text)
  File "./convert.py", line 68, in pagehandler
    s.index[title.encode("UTF8")] = (s.chunks, len(s.pages))
  File "/usr/lib/python2.6/shelve.py", line 133, in __setitem__
    self.dict[key] = f.getvalue()
  File "/usr/lib/python2.6/bsddb/__init__.py", line 276, in __setitem__
    _DeadlockWrap(wrapF)  # self.db[key] = value
  File "/usr/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap
    return function(*_args, **_kwargs)
  File "/usr/lib/python2.6/bsddb/__init__.py", line 275, in wrapF
    self.db[key] = value
bsddb.db.DBRunRecoveryError: (-30975, 'DB_RUNRECOVERY: Fatal error, run
database recovery -- PANIC: Invalid argument')
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in <bound method Parser.__del__ of
<__main__.Parser instance at 0x7f3492966d40>> ignored
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in  ignored
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in  ignored
msg109267 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-04 21:31
Hi Alex,

Looks like nothing will happen with this unless you do something yourself.
msg109270 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-04 22:10
The wikisource file in the report is no longer available, but with latest wikisource and python 2.7,


$ curl http://download.wikimedia.org/enwikisource/latest/enwikisource-latest-pages-articles.xml.bz2| bzip2 -cd | ./python.exe convert.py /tmp 8388608

went through first 50MiB without an error.   I am not sure I'll have the patience to run this to completion, but it looks like this is out of date.
msg109294 - (view) Author: Alex Roper (calmofthestorm) Date: 2010-07-05 04:02
I've just been using the sq_dict module, which is a drop-in replacement for shelve written 
using sqlite3. BDB is a pretty squirraly piece of software in my experience. It may or may 
not be stable on it's own, but its APIs are pretty poorly documented and programmers tend 
to misuse them without knowing it.

Every job I've done with it has involved major hacks such as API interception and 
replacement with sqlite3, cronjobs to rebuild hte database every hour, etc. It's also nice 
to have databases that are platform independent, and in all the applications I use the 
slight slowdown for sqlite is acceptable (I mean I /am/ using Python)

YMMV of course. Also I know at one point Python 3 was going to use sqlite. The sq_dict I 
mention is on Bugzilla somewhere, or email me if you need a copy.

Alex

Alexander Belopolsky wrote:
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> The wikisource file in the report is no longer available, but with latest wikisource and python 2.7,
> 
> 
> $ curl http://download.wikimedia.org/enwikisource/latest/enwikisource-latest-pages-articles.xml.bz2| bzip2 -cd | ./python.exe convert.py /tmp 8388608
> 
> went through first 50MiB without an error.   I am not sure I'll have the patience to run this to completion, but it looks like this is out of date.
> 
> ----------
> nosy: +belopolsky
> resolution:  -> out of date
> stage:  -> unit test needed
> status: open -> pending
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4679>
> _______________________________________
msg109295 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-05 04:20
Sorry, but I don't understand the point that you are trying to make.  sq_dict is indeed considered for inclusion in python: see issue 3783.

For this issue, we need a confirmation that the problem is present in the current version.
msg109321 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-05 14:25
Also note that bsddb's version was bumped in 2.7, so this bug may indeed be out of date.  Alex, if you can't reproduce it (or don't have any desire to try to do so), we will close this as out of date.
msg109351 - (view) Author: Alex Roper (calmofthestorm) Date: 2010-07-05 19:52
Go ahead

"R. David Murray" <report@bugs.python.org> wrote:

>
>R. David Murray <rdmurray@bitdance.com> added the comment:
>
>Also note that bsddb's version was bumped in 2.7, so this bug may indeed be out of date.  Alex, if you can't reproduce it (or don't have any desire to try to do so), we will close this as out of date.
>
>----------
>nosy: +jcea, r.david.murray
>status: open -> pending
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><http://bugs.python.org/issue4679>
>_______________________________________
History
Date User Action Args
2010-07-05 19:56:36belopolskysetstatus: open -> closed
2010-07-05 19:52:23calmofthestormsetstatus: pending -> open

messages: + msg109351
2010-07-05 14:25:41r.david.murraysetstatus: open -> pending
nosy: + jcea, r.david.murray
messages: + msg109321

2010-07-05 04:20:22belopolskysetmessages: + msg109295
2010-07-05 04:02:40calmofthestormsetstatus: pending -> open

messages: + msg109294
2010-07-04 22:10:56belopolskysetstatus: open -> pending

nosy: + belopolsky
messages: + msg109270

resolution: out of date
stage: test needed
2010-07-04 21:31:55BreamoreBoysetnosy: + BreamoreBoy
messages: + msg109267
2008-12-16 23:42:21calmofthestormcreate