Issue4679
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-12-16 23:42 by calmofthestorm, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
convert.py | calmofthestorm, 2008-12-16 23:42 | Non-reduced failure case |
Messages (7) | |||
---|---|---|---|
msg77942 - (view) | Author: Alex Roper (calmofthestorm) | Date: 2008-12-16 23:42 | |
Hi, I wrote a simple script (attached) to do some preprocessing of MediaWiki XML dumps. When it has a 8 MB chunk ready to dump to disk, it forks, and the child writes it out and (will) compress it, then exit. The main thread continues as before. Note that the child thread never touches (or executes code that has in scope) the shelve handle. The attached script, as written, will work fine on dumps (I tested it on enwikisource-20081112-pages-articles.xml available from http://download.wikimedia.org/enwikisource/20081112/). If you uncomment the fork on line 40 (and the exit() on line 46 of course) and run it, it will die after writing out about 450 megabytes with the backtrace below. This appears to happen deterministically at the same place 3 of the 3 times I ran it. Apologies for the size and complexity of the test, I don't have time to reduce it further at the moment, and it looks like it may be fairly involved. I can try to work out a reduced case later and resubmit if no one wants to touch this as is;) # I ran the script with: bzcat enwikisource-20081112-pages-articles.xml.bz2 | ./convert.py wikisource 8388608 # (after making a dir called wikisource) Let me know if I can be of any assistance, and apologies if this is somewhere documented and I missed it. Using Python 2.6.1 as released from python.org. Alex alexr@autumn:~/projects/wikipedia$ cat enwikisource-20081112-pages-articles.xml | ./convert.py wikisource 8388608 Alexandria version 1, Copyright (C) 2008 Alex Roper Alexandria comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to copy modify, and redistribute it under certain conditions; see the file COPYING for details. ..........................................................Traceback (most recent call last): File "./convert.py", line 100, in <module> sax.parse(sys.stdin, Parser(sys.argv[1], MIN_CHUNK_SIZE)) File "/usr/lib/python2.6/xml/sax/__init__.py", line 33, in parse parser.parse(source) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.6/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 304, in end_element self._cont_handler.endElement(name) File "./convert.py", line 61, in endElement s.pagehandler(s.title, s.text) File "./convert.py", line 68, in pagehandler s.index[title.encode("UTF8")] = (s.chunks, len(s.pages)) File "/usr/lib/python2.6/shelve.py", line 133, in __setitem__ self.dict[key] = f.getvalue() File "/usr/lib/python2.6/bsddb/__init__.py", line 276, in __setitem__ _DeadlockWrap(wrapF) # self.db[key] = value File "/usr/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap return function(*_args, **_kwargs) File "/usr/lib/python2.6/bsddb/__init__.py", line 275, in wrapF self.db[key] = value bsddb.db.DBRunRecoveryError: (-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: Invalid argument') Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in <bound method Parser.__del__ of <__main__.Parser instance at 0x7f3492966d40>> ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored |
|||
msg109267 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010-07-04 21:31 | |
Hi Alex, Looks like nothing will happen with this unless you do something yourself. |
|||
msg109270 - (view) | Author: Alexander Belopolsky (belopolsky) * | Date: 2010-07-04 22:10 | |
The wikisource file in the report is no longer available, but with latest wikisource and python 2.7, $ curl http://download.wikimedia.org/enwikisource/latest/enwikisource-latest-pages-articles.xml.bz2| bzip2 -cd | ./python.exe convert.py /tmp 8388608 went through first 50MiB without an error. I am not sure I'll have the patience to run this to completion, but it looks like this is out of date. |
|||
msg109294 - (view) | Author: Alex Roper (calmofthestorm) | Date: 2010-07-05 04:02 | |
I've just been using the sq_dict module, which is a drop-in replacement for shelve written using sqlite3. BDB is a pretty squirraly piece of software in my experience. It may or may not be stable on it's own, but its APIs are pretty poorly documented and programmers tend to misuse them without knowing it. Every job I've done with it has involved major hacks such as API interception and replacement with sqlite3, cronjobs to rebuild hte database every hour, etc. It's also nice to have databases that are platform independent, and in all the applications I use the slight slowdown for sqlite is acceptable (I mean I /am/ using Python) YMMV of course. Also I know at one point Python 3 was going to use sqlite. The sq_dict I mention is on Bugzilla somewhere, or email me if you need a copy. Alex Alexander Belopolsky wrote: > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > The wikisource file in the report is no longer available, but with latest wikisource and python 2.7, > > > $ curl http://download.wikimedia.org/enwikisource/latest/enwikisource-latest-pages-articles.xml.bz2| bzip2 -cd | ./python.exe convert.py /tmp 8388608 > > went through first 50MiB without an error. I am not sure I'll have the patience to run this to completion, but it looks like this is out of date. > > ---------- > nosy: +belopolsky > resolution: -> out of date > stage: -> unit test needed > status: open -> pending > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4679> > _______________________________________ |
|||
msg109295 - (view) | Author: Alexander Belopolsky (belopolsky) * | Date: 2010-07-05 04:20 | |
Sorry, but I don't understand the point that you are trying to make. sq_dict is indeed considered for inclusion in python: see issue 3783. For this issue, we need a confirmation that the problem is present in the current version. |
|||
msg109321 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2010-07-05 14:25 | |
Also note that bsddb's version was bumped in 2.7, so this bug may indeed be out of date. Alex, if you can't reproduce it (or don't have any desire to try to do so), we will close this as out of date. |
|||
msg109351 - (view) | Author: Alex Roper (calmofthestorm) | Date: 2010-07-05 19:52 | |
Go ahead "R. David Murray" <report@bugs.python.org> wrote: > >R. David Murray <rdmurray@bitdance.com> added the comment: > >Also note that bsddb's version was bumped in 2.7, so this bug may indeed be out of date. Alex, if you can't reproduce it (or don't have any desire to try to do so), we will close this as out of date. > >---------- >nosy: +jcea, r.david.murray >status: open -> pending > >_______________________________________ >Python tracker <report@bugs.python.org> ><http://bugs.python.org/issue4679> >_______________________________________ |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:42 | admin | set | github: 48929 |
2010-07-05 19:56:36 | belopolsky | set | status: open -> closed |
2010-07-05 19:52:23 | calmofthestorm | set | status: pending -> open messages: + msg109351 |
2010-07-05 14:25:41 | r.david.murray | set | status: open -> pending nosy: + jcea, r.david.murray messages: + msg109321 |
2010-07-05 04:20:22 | belopolsky | set | messages: + msg109295 |
2010-07-05 04:02:40 | calmofthestorm | set | status: pending -> open messages: + msg109294 |
2010-07-04 22:10:56 | belopolsky | set | status: open -> pending nosy: + belopolsky messages: + msg109270 resolution: out of date stage: test needed |
2010-07-04 21:31:55 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg109267 |
2008-12-16 23:42:21 | calmofthestorm | create |