Title: ElementTree memory leak
Messages (4)
msg160266 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 09:39
I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function.
Memory grows disproportionately to dozens of GB when parsing a large XML file.

For further information, see discussion in:
but notice that the comments attributing the problem to the OS are quite off the mark.

To replicate the problem, try this on a Wikipedia dump:

    iterparse = ElementTree.iterparse(file)
    id = None
    for event, elem in iterparse:
        if elem.tag.endswith("title"):
            title = elem.text
        elif elem.tag.endswith("id") and not id:
            id = elem.text
        elif elem.tag.endswith("text"):
           print id, title, elem.text[:20]
msg160275 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-09 11:39
Can you specify how you import ET? I.e. from the pure Python or the C accelerator?

Also, do you realize that the element iterparse returns should be discarded with 'clear'? [see tutorial here:]
msg160286 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-05-09 12:47
Can this be reproduced in 3.2/3.3?
msg160288 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 13:35
You are right, I should discard the elements.

Thank you.
