This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author hanselda
Recipients hanselda
Date 2008-05-11.13:32:15
SpamBayes Score 0.22639254
Marked as misclassified No
Message-id <1210512738.34.0.315015103661.issue2818@psf.upfronthosting.co.za>
In-reply-to
Content
when use xml.dom.pulldom module to parse a large xml file, if all the 
information is saved in one xml file, the module can handle it in the 
following way without construction the whole DOM:

events = xml.dom.pulldom.parse('file.xml')
for (event, node) in events:
    process(event, node)

But if 'file.xml' contains some large external entities, for example:

<!ENTITY file_external SYSTEM "others.xml">
<body>&file_external;</body>

Then using the same python snippet above leads to enormous memory 
usage. I did not perform a concrete benchmark, in one case a 3M 
external xml file drained about 1 GB memory. I think in this case it 
might be the whole DOM structure is constructed.
History
Date User Action Args
2008-05-11 13:32:18hanseldasetspambayes_score: 0.226393 -> 0.22639254
recipients: + hanselda
2008-05-11 13:32:18hanseldasetspambayes_score: 0.226393 -> 0.226393
messageid: <1210512738.34.0.315015103661.issue2818@psf.upfronthosting.co.za>
2008-05-11 13:32:17hanseldalinkissue2818 messages
2008-05-11 13:32:16hanseldacreate