This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author scoder
Recipients docs@python, eric.araujo, ezio.melotti, fdrake, loewis, pitrou, scoder
Date 2011-11-29.19:02:15
SpamBayes Score 4.82851e-07
Marked as misclassified No
Message-id <1322593336.53.0.81931729438.issue11379@psf.upfronthosting.co.za>
In-reply-to
Content
Given that the links were generally somewhat dated and used Py2.x instead of the post-PEP393 Py3.3, here is another little benchmark, comparing the parser performance of minidom to lxml.etree (latest), ElementTree and cElementTree (stdlib) in a recent Py3.3 build (e66b7c62eec0), everything properly optimised for my platform (Linux 64bit). I used os.fork() to start a new process after importing everything and reading the file a couple of times, and before parsing. The memory usage is measured inside of the forked child using the resource module's ru_maxrss value, so it correlates with the growth of CPython's memory heap after parsing, thus giving an estimate of the maximum amount of memory used during parsing and tree building.

Parsing hamlet.xml in English, 274KB:

Memory usage: 7284
xml.etree.ElementTree.parse done in 0.104 seconds
Memory usage: 14240 (+6956)
xml.etree.cElementTree.parse done in 0.022 seconds
Memory usage: 9736 (+2452)
lxml.etree.parse done in 0.014 seconds
Memory usage: 11028 (+3744)
minidom tree read in 0.152 seconds
Memory usage: 30360 (+23076)

Parsing the old testament in English (ot.xml, 3.4MB) into memory:

Memory usage: 20444
xml.etree.ElementTree.parse done in 0.385 seconds
Memory usage: 46088 (+25644)
xml.etree.cElementTree.parse done in 0.056 seconds
Memory usage: 32628 (+12184)
lxml.etree.parse done in 0.041 seconds
Memory usage: 37500 (+17056)
minidom tree read in 0.672 seconds
Memory usage: 110428 (+89984)

A 25MB XML file with Slavic Unicode text content:

Memory usage: 57368
xml.etree.ElementTree.parse done in 3.274 seconds
Memory usage: 223720 (+166352)
xml.etree.cElementTree.parse done in 0.459 seconds
Memory usage: 154012 (+96644)
lxml.etree.parse done in 0.454 seconds
Memory usage: 135720 (+78352)
minidom tree read in 6.193 seconds
Memory usage: 604860 (+547492)

And a contrived 4.5MB XML file with lot more structure than data:

Memory usage: 13308
xml.etree.ElementTree.parse done in 4.178 seconds
Memory usage: 222088 (+208780)
xml.etree.cElementTree.parse done in 0.478 seconds
Memory usage: 103056 (+89748)
lxml.etree.parse done in 0.199 seconds
Memory usage: 101860 (+88552)
minidom tree read in 8.705 seconds
Memory usage: 810964 (+797656)

Things to note: The factor of 5-10 for the memory overhead compared to cET depends heavily on the data. Also, minidom is consistently slower by more than a factor of 10 compared to the fastest parser (apparently the one in libxml2/lxml.etree, both of which surely can't be said to provide less features than the DOM that minidom implements).
History
Date User Action Args
2011-11-29 19:02:16scodersetrecipients: + scoder, loewis, fdrake, pitrou, ezio.melotti, eric.araujo, docs@python
2011-11-29 19:02:16scodersetmessageid: <1322593336.53.0.81931729438.issue11379@psf.upfronthosting.co.za>
2011-11-29 19:02:15scoderlinkissue11379 messages
2011-11-29 19:02:15scodercreate