classification
Title: ElementTree memory leak
Type: resource usage Stage:
Components: Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Giuseppe.Attardi, eli.bendersky, flox, jcea
Priority: normal Keywords:

Created on 2012-05-09 09:39 by Giuseppe.Attardi, last changed 2012-05-09 13:36 by Giuseppe.Attardi. This issue is now closed.

Messages (4)
msg160266 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 09:39
I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function.
Memory grows disproportionately to dozens of GB when parsing a large XML file.

For further information, see discussion in:
  http://www.gossamer-threads.com/lists/python/bugs/912164?do=post_view_threaded#912164
but notice that the comments attributing the problem to the OS are quite off the mark.

To replicate the problem, try this on a Wikipedia dump:

    iterparse = ElementTree.iterparse(file)
    id = None
    for event, elem in iterparse:
        if elem.tag.endswith("title"):
            title = elem.text
        elif elem.tag.endswith("id") and not id:
            id = elem.text
        elif elem.tag.endswith("text"):
           print id, title, elem.text[:20]
msg160275 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-05-09 11:39
Can you specify how you import ET? I.e. from the pure Python or the C accelerator?

Also, do you realize that the element iterparse returns should be discarded with 'clear'? [see tutorial here: http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/]
msg160286 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-05-09 12:47
Can this be reproduced in 3.2/3.3?
msg160288 - (view) Author: Giuseppe Attardi (Giuseppe.Attardi) Date: 2012-05-09 13:35
You are right, I should discard the elements.

Thank you.
History
Date User Action Args
2012-05-09 13:36:30Giuseppe.Attardisetstatus: open -> closed
resolution: not a bug
2012-05-09 13:35:29Giuseppe.Attardisetmessages: + msg160288
2012-05-09 12:47:18jceasetnosy: + jcea
messages: + msg160286
2012-05-09 11:39:01eli.benderskysetmessages: + msg160275
2012-05-09 11:18:42pitrousetnosy: + eli.bendersky, flox
2012-05-09 09:39:44Giuseppe.Attardicreate