Author jess.j
Recipients jess.j
Date 2018-12-14.19:59:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1544817540.85.0.788709270274.issue35502@psf.upfronthosting.co.za>
In-reply-to
Content
When given xml that that would raise a ParseError, but parsing is stopped before the ParseError is raised, xml.etree.ElementTree.iterparse leaks memory.

Example:


import gc
from io import StringIO
import xml.etree.ElementTree as etree

import objgraph


def parse_xml():
    xml = """
      <LEVEL1>
      </LEVEL1>
    </ROOT>
    """
    parser = etree.iterparse(StringIO(initial_value=xml))
    for _, elem in parser:
        if elem.tag == 'LEVEL1':
            break


def run():
    parse_xml()

    gc.collect()
    uncollected_elems = objgraph.by_type('Element')
    print(uncollected_elems)
    objgraph.show_backrefs(uncollected_elems, max_depth=15)


if __name__ == "__main__":
    run()


Output:
[<Element 'LEVEL1' at 0x10df712c8>]

Also see this gist which has an image showing the objects that are retained in memory: https://gist.github.com/grokcode/f89d5c5f1831c6bc373be6494f843de3
History
Date User Action Args
2018-12-14 19:59:00jess.jsetrecipients: + jess.j
2018-12-14 19:59:00jess.jsetmessageid: <1544817540.85.0.788709270274.issue35502@psf.upfronthosting.co.za>
2018-12-14 19:59:00jess.jlinkissue35502 messages
2018-12-14 19:59:00jess.jcreate