Author ezio.melotti
Recipients Arfrever, danielsh, einarfd, eli.bendersky, ezio.melotti, flox, georg.brandl, jcea, larry, python-dev, santoso.wijaya, skrah
Date 2013-01-11.08:44:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1357893883.51.0.604044509271.issue16076@psf.upfronthosting.co.za>
In-reply-to
Content
This is what I found out.
I used an easily copy/pastable one-liner that creates 3 variables: e (no children), e2 (3 children), e3 (5 children).

Original leaky code (test_xml_etree_c leaked [56, 56] references, sum=112):
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### e has no children and leaks 1 ref:
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76791 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76792 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76793 refs]
>>> ### e2 has 3 children and leaks 4 refs:
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76798 refs]
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76802 refs]

The leaked refs seems to be 1 for the children list + 1 for each children.
The diff I pasted in the previous *seems* to fix this (i.e. leaks in __gestate__ are gone, tests pass), but I had it crash once (couldn't reproduce after that, so it might be unrelated*), and I'm not sure it's correct.

With that patch applied we go down to test_xml_etree_c leaked [6, 6] references, sum=12.
The remaining leaks seem to be in __setstate__.
Patched code:
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### no more leaks for getstate:
>>> p = e.__getstate__()
[76787 refs]
>>> p = e.__getstate__()
[76787 refs]
>>> ### also no leaks when there are no child:
>>> e.__setstate__(p)
[76788 refs]
>>> e.__setstate__(p)
[76788 refs]
>>> ### no more leaks for getstate with children:
>>> p2 = e2.__getstate__()
[76807 refs]
>>> p2 = e2.__getstate__()
[76807 refs]
>>> ### one ref leaked for every child in __setstate__:
>>> e2.__setstate__(p2)
[76810 refs]
>>> e2.__setstate__(p2)
[76813 refs]
>>> e2.__setstate__(p2)
[76816 refs]

I'm not working on this anymore now, so someone more familiar with the code can take a look, see if my patch is correct, and fix the remaining leaks.

* maybe I'm doing something wrong, but ISTM that ``make -j2`` doesn't always work right away, and sometimes I get different results if I do it again without touching the code.
History
Date User Action Args
2013-01-11 08:44:43ezio.melottisetrecipients: + ezio.melotti, georg.brandl, jcea, larry, Arfrever, eli.bendersky, skrah, flox, santoso.wijaya, python-dev, einarfd, danielsh
2013-01-11 08:44:43ezio.melottisetmessageid: <1357893883.51.0.604044509271.issue16076@psf.upfronthosting.co.za>
2013-01-11 08:44:43ezio.melottilinkissue16076 messages
2013-01-11 08:44:42ezio.melotticreate