This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author calmofthestorm
Recipients calmofthestorm
Date 2008-12-16.23:42:16
SpamBayes Score 4.9998894e-13
Marked as misclassified No
Message-id <1229470942.41.0.397425864374.issue4679@psf.upfronthosting.co.za>
In-reply-to
Content
Hi,

I wrote a simple script (attached) to do some preprocessing of MediaWiki
XML dumps. When it has a 8 MB chunk ready to dump to disk, it forks, and
the child writes it out and (will) compress it, then exit. The main
thread continues as before. Note that the child thread never touches (or
executes code that has in scope) the shelve handle.

The attached script, as written, will work fine on dumps (I tested it on
enwikisource-20081112-pages-articles.xml available from
http://download.wikimedia.org/enwikisource/20081112/). If you uncomment
the fork on line 40 (and the exit() on line 46 of course) and run it, it
will die after writing out about 450 megabytes with the backtrace below.

This appears to happen deterministically at the same place 3 of the 3
times I ran it. Apologies for the size and complexity of the test, I
don't have time to reduce it further at the moment, and it looks like it
may be fairly involved. I can try to work out a reduced case later and
resubmit if no one wants to touch this as is;)

# I ran the script with:
bzcat enwikisource-20081112-pages-articles.xml.bz2 | ./convert.py
wikisource 8388608
# (after making a dir called wikisource)

Let me know if I can be of any assistance, and apologies if this is
somewhere documented and I missed it.

Using Python 2.6.1 as released from python.org.

Alex

alexr@autumn:~/projects/wikipedia$ cat
enwikisource-20081112-pages-articles.xml | ./convert.py wikisource 8388608
Alexandria version 1, Copyright (C) 2008 Alex Roper
Alexandria comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to copy modify, and
redistribute it
under certain conditions; see the file COPYING for details.
..........................................................Traceback
(most recent call last):
  File "./convert.py", line 100, in <module>
    sax.parse(sys.stdin, Parser(sys.argv[1], MIN_CHUNK_SIZE))
  File "/usr/lib/python2.6/xml/sax/__init__.py", line 33, in parse
    parser.parse(source)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.6/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/lib/python2.6/xml/sax/expatreader.py", line 304, in end_element
    self._cont_handler.endElement(name)
  File "./convert.py", line 61, in endElement
    s.pagehandler(s.title, s.text)
  File "./convert.py", line 68, in pagehandler
    s.index[title.encode("UTF8")] = (s.chunks, len(s.pages))
  File "/usr/lib/python2.6/shelve.py", line 133, in __setitem__
    self.dict[key] = f.getvalue()
  File "/usr/lib/python2.6/bsddb/__init__.py", line 276, in __setitem__
    _DeadlockWrap(wrapF)  # self.db[key] = value
  File "/usr/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap
    return function(*_args, **_kwargs)
  File "/usr/lib/python2.6/bsddb/__init__.py", line 275, in wrapF
    self.db[key] = value
bsddb.db.DBRunRecoveryError: (-30975, 'DB_RUNRECOVERY: Fatal error, run
database recovery -- PANIC: Invalid argument')
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in <bound method Parser.__del__ of
<__main__.Parser instance at 0x7f3492966d40>> ignored
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in  ignored
Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975,
'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal
region error detected; run recovery') in  ignored
History
Date User Action Args
2008-12-16 23:42:22calmofthestormsetrecipients: + calmofthestorm
2008-12-16 23:42:22calmofthestormsetmessageid: <1229470942.41.0.397425864374.issue4679@psf.upfronthosting.co.za>
2008-12-16 23:42:21calmofthestormlinkissue4679 messages
2008-12-16 23:42:20calmofthestormcreate