xml.etree.ElementTree.Element is no longer pickleable #60280

einarfd · 2012-09-28T08:54:59Z

BPO	16076
Nosy	@birkenfeld, @jcea, @larryhastings, @ezio-melotti, @skrah, @florentx
Files	i16076-cpatch.diff i16076-v3-combined.diff i16076-v4-combined.diff i16076-v5.diff transcript-test___all__.txt i16076-v6.diff i16076-v7.diff i16076-v8.diff memleak-v1.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-01-12.13:21:35.399>
created_at = <Date 2012-09-28.08:54:59.477>
labels = ['extension-modules', 'expert-XML', 'type-bug', 'library', 'release-blocker']
title = 'xml.etree.ElementTree.Element is no longer pickleable'
updated_at = <Date 2013-01-12.13:43:31.998>
user = 'https://bugs.python.org/einarfd'

bugs.python.org fields:

activity = <Date 2013-01-12.13:43:31.998>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2013-01-12.13:21:35.399>
closer = 'python-dev'
components = ['Extension Modules', 'Library (Lib)', 'XML']
creation = <Date 2012-09-28.08:54:59.477>
creator = 'einarfd'
dependencies = []
files = ['28199', '28252', '28261', '28363', '28484', '28502', '28627', '28639', '28691']
hgrepos = []
issue_num = 16076
keywords = ['patch', '3.3regression']
message_count = 50.0
messages = ['171418', '171737', '172297', '172613', '172665', '176854', '177134', '177136', '177141', '177170', '177590', '177766', '178498', '178499', '178500', '178509', '178546', '178578', '178579', '178627', '178632', '178721', '178736', '178745', '178770', '178776', '178783', '178788', '178946', '179002', '179011', '179032', '179243', '179244', '179314', '179315', '179354', '179356', '179386', '179393', '179395', '179400', '179544', '179546', '179553', '179642', '179651', '179667', '179790', '179792']
nosy_count = 12.0
nosy_names = ['georg.brandl', 'jcea', 'larry', 'ezio.melotti', 'Arfrever', 'eli.bendersky', 'skrah', 'flox', 'santoso.wijaya', 'python-dev', 'einarfd', 'danielsh']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue16076'
versions = ['Python 3.3', 'Python 3.4']

einarfd · 2012-09-28T08:54:59Z

The xml.etree.ElementTree.Element class is no longer pickleable in Python 3.3.0rc3 .

This is a regression from Python 3.2 where the Element class was pickleable, while the xml.etree.cElementTree.Element was not. So this is probably related to switching the ElementTree implementation in Python 3.3.

I've looked at the "what's new" documentation in Python 3.3, and there was nothing that indicated that the ElemenTree switchover would give this kind of issues.

I've run into this issues because I use the multiprocessing module to pass parsed XML documents between processes, and when I can't pickle Element objects. This stops working.

You can reproduce the issue with the following code:
import pickle
from xml.etree.ElementTree import Element

print(pickle.dumps(Element("foo")))

serhiy-storchaka · 2012-10-01T17:52:06Z

In 3.2 repr(xml.etree.ElementTree.Element) is "<class 'xml.etree.ElementTree.Element'>".
In 3.3 repr(xml.etree.ElementTree.Element) is "<class 'Element'>".

elibendersky · 2012-10-07T13:19:27Z

Will look into it, thanks for the report.

santosowijaya · 2012-10-11T01:37:46Z

This is because, in Python 3.3, xml.etree.ElementTree.Element is overridden by the C accelerator version, _elementtree.Element.

If you remove the statement,

1708 from _elementtree import *

the behavior from Python 3.2 comes back.

santosowijaya · 2012-10-11T17:07:17Z

OTOH, xml.etree.cElementTree.Element in Python 3.2 and earlier has never been pickleable, either.

Python 3.2.3+ (3.2:24499eebbc2f, Oct 10 2012, 13:54:45) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.cElementTree import Element
>>> repr(Element)
'<built-in function Element>'
>>> import pickle
>>> pickle.dumps(Element('foo'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed

danielsh · 2012-12-03T16:48:11Z

Attached patch for your consideration.

I've tested pickling/unpickling and comparing the resulting object attribute by attribute (.tag, .attrib, .text, .tail for equality; and recursively .getchildren()), and 'make test' --- all seems to work.

If the approach is sound, I can submit a revised patch that also updates documentation, regression tests, and potententially updates other type objects in the same manner as this patch updates _elementtree.Element .

danielsh · 2012-12-07T22:23:23Z

TreeBuilder was also pickleable in 3.2 but now isn't so; updating summary accordingly.

Attaching a checkpoint patch. It addresses both Element and TreeBuilder, and adds tests. The added tests fail if test___all__ is included in the test run, so the patch should not be applied as-is.

Thanks to Ezio Melotti for preliminary review and suggestions on IRC.

danielsh · 2012-12-07T22:36:40Z

Reattaching without the unrelated Python-ast.c change.

elibendersky · 2012-12-08T03:15:19Z

Thanks for working on it, Daniel. Unfortunately I won't have time to look at it in the near future, but I will definitely look at the patch once I get some free time to hack on Python again.

danielsh · 2012-12-08T18:50:44Z

@eli, thanks. In the meantime, attaching a patch addressing Ezio's review; the caveats from msg 177134 still apply.

danielsh · 2012-12-16T11:58:40Z

Haven't had a chance to look at the test___all__ -related failures yet. Still planning to do so once I have some time.

danielsh · 2012-12-19T17:19:28Z

Re the problem from msg 177134 ('./python -mtest test___all__ test_xml_etree' failing), it is a a preexisting problem, in the sense that applying just the Lib/test/ part of this patch suffices to trigger it. Adding 'xml.etree' to the blacklist in test___all__.py:50 suffices to address the problem. At a guess that is because when test___all__ imports the 'xml.etree' module, it doesn't filter out the '_elementtree' symbol like test_xml_etree.py:test_main() does.

New patch attached; differences to previous version:

Removed debug prints.
Skip test_pickling when testing the Python xml.etree.
(The skips could be removed in the future, if/when the test___all__
interaction problem is resolved.)

elibendersky · 2012-12-29T15:00:01Z

Daniel, is your patch made vs. the 3.3 branch? I'll need to apply there first, and then merge up to default (3.4).

[Also, removing the 3.2 tag here. 3.2 won't be fixed to make _elementtree pickleable - it never was].

danielsh · 2012-12-29T15:04:47Z

I wrote the patch against default (3.4), but it applies cleanly to 3.3.

elibendersky · 2012-12-29T15:10:34Z

Also, could you explain what makes test___all__ start failing with this patch? What are you adding that makes that happen?

P.S. I suspect the root reason is the bad way etree tests are structured in general. See bpo-15083

danielsh · 2012-12-29T16:18:13Z

Any attempt to pickle an Element object in test_xml_etree.py causes that test to fail if test___all__ had been run before it; see attached transcript. It's against 3.4, with no changes other than the testsuite changes shown within.

I don't know what the root cause is. As noted before, adding 'xml.etree' to test___all__.AllTest.test_all.blacklist is a workaround, so I assume the root cause lies within the test framework --- a bad interaction between the import magics in test___all__ and test_xml_etree.

elibendersky · 2012-12-30T00:18:33Z

I think I understand what's going on there. Pickle, to make sure an object can be picked, looks at the module it comes from and tries to import its class, to make sure they're the same.

What happens since 3.3 is that for the Python Element class, it imports ElementTree. The harness in test_xml_etree carefully sets up to ignore _elementtree so the correct class is pulled. However, if test___all__ runs before it it just does a brute "from ElementTree import *" which places the C version in sys.modules, and this is what pickle finds.

So in the meantime, until bpo-15083 is resolved I think it's safe to put ET in the ignore list of test___all__.

python-dev · 2012-12-30T14:18:33Z

New changeset 71508fc738bb by Eli Bendersky in branch '3.3':
For Issue bpo-16076: make sure that pickling of Element objects is tested, and do
http://hg.python.org/cpython/rev/71508fc738bb

New changeset 5a38f4d7833c by Eli Bendersky in branch 'default':
For issue bpo-16076: merge 3.3
http://hg.python.org/cpython/rev/5a38f4d7833c

elibendersky · 2012-12-30T14:24:17Z

I've added some (currently pyET specific) pickling tests. Daniel, could you re-generate the patch? Note that for the C version of pickling you can now enable the pickle test. Point to consider - can elements pickled in C be unpickled in Python and vice versa? It should probably work (see test_decimal's pickling tests).

danielsh · 2012-12-30T23:57:33Z

Attached. Differences to previous version:

Avoid the test___all__ issue (bpo-16817) by manipulating sys.modules
directly
Support pickling interoperability between the C and Python
implementations

danielsh · 2012-12-31T00:54:58Z

BTW, maybe I'm just confused, but I'm a little surprised that C pickling -> Python unpickling works: the C __reduce__ returns a 3-element tuple, which according to pickle docs[1] must either be a dictionary or be passed to __setstate__; it's not a dictionary and the Python implementation doesn't have a __setstate__.

http://docs.python.org/3/library/pickle#pickle.object.\_\_reduce__

elibendersky · 2013-01-01T00:32:51Z

A couple of high-level questions about the C code (this isn't a detailed review yet):

Why did you choose to implement __reduce__ and not __getstate__?
A lot of code appears to be shared between element_setstate_from_attributes and the init method implementation, can it be refactored?
I've been thinking about the C<->Python pickle compatibility, and it may be too much trouble at this point for no very strong benefit. What could make more sense at some point is implementing a __getstate__/setstate in both the Python and C versions that are compatible (i.e. produce the same things). But this may mean changing the way the Python Element is pickled in 3.3, which may not be acceptable. So we may prefer to forego the compatibility in 3.3 and then for 3.4 make the change. I'm still not sure what is best here.

danielsh · 2013-01-01T22:56:48Z

Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:

Why did you change the class name, by the way, I don't think it's
a valid change at least for 3.3 in terms of backwards compatibility.

With unmodified tip of 3.4:

    >>> import pickle, xml.etree.ElementTree as ET
    >>> pickle.dumps(ET.Element('foo'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed

I added the "_elementtree" to the tp_name in order to bypass the above
error. Module-qualified names were in use elsewhere (including by
_elementtree._element_iterator) so it seemed reasonable. I'll defer to
you about compatibility implications of this change.

Regarding that compatibility, and even easier idea would be for the
C pickle to return the same __dict__ implicitly gathered from the
Python version, and then only one version of the unpickle is required.

That makes sense. But going forward it might be even better to define
an explicit __reduce__/getstate for the Python version too, so if
the instance dict grows new members, they won't get serialised
unintentionally. (This consideration is also why C code in the latest
patch makes some effort to notice unknown args in the dict.)

danielsh · 2013-01-01T23:11:48Z

Eli Bendersky wrote on Tue, Jan 01, 2013 at 00:32:51 +0000:

Why did you choose to implement __reduce__ and not __getstate__?

Maybe I was simply imitating what some other extension module was doing ;)

By using __reduce__ with the type as first return value, the
__setstate__ code becomes simpler because it doesn't need to address
'tag' and 'attrib'. This consideration is somewhat moot now that the
unpickle-Python-pickled-Element's code path has to think about all
instance attributes, including those settable by the constructor.

A lot of code appears to be shared between
element_setstate_from_attributes and the init method implementation,
can it be refactored?

It seems there might be room for code reuse --- both functions set
tag,text,tail,attrib (but the unpickler sets the children too). I'll
look into that in the next iteration (once the class name and pickle
output format issues are settled).

danielsh · 2013-01-01T23:43:55Z

Also, the class inheritance in the tests should be amended to follow bpo-16835.

elibendersky · 2013-01-03T14:44:02Z

On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:

>
> Daniel Shahaf added the comment:
>
> Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:
> > Why did you change the class name, by the way, I don't think it's
> > a valid change at least for 3.3 in terms of backwards compatibility.
> >
>
> With unmodified tip of 3.4:
>
>     >>> import pickle, xml.etree.ElementTree as ET
>     >>> pickle.dumps(ET.Element('foo'))
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     _pickle.PicklingError: Can't pickle <class 'Element'>: attribute
> lookup builtins.Element failed
>
> I added the "_elementtree" to the tp_name in order to bypass the above
> error.  Module-qualified names were in use elsewhere (including by
> _elementtree._element_iterator) so it seemed reasonable.  I'll defer to
> you about compatibility implications of this change.

I asked on pydev, but this is a key point to resolve. Can
pickling/unpickling be made to work correctly without such (or similar)
change at all? How will the unpickler know which module to load when it
sees a pickled object of Element type?

If this change is required (even if we choose to name it
"xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
regression, we may find ourselves in a need to change it between 3.3 and
3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
resolved conclusively.

Danial, could you investigate if such a change is absolutely required to
make pickling/unickling of Element work?

danielsh · 2013-01-04T02:09:36Z

Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:

On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:
> I added the "_elementtree" to the tp_name in order to bypass the above
> error. Module-qualified names were in use elsewhere (including by
> _elementtree._element_iterator) so it seemed reasonable. I'll defer to
> you about compatibility implications of this change.

I asked on pydev, but this is a key point to resolve. Can
pickling/unpickling be made to work correctly without such (or similar)
change at all? How will the unpickler know which module to load when it
sees a pickled object of Element type?

Is there a requirement that it loads a particular module? Would etree
users notice the difference if pickle.load() returns an instance of the
"other" Element implementation than the one they pickled?

Danial, could you investigate if such a change is absolutely required to
make pickling/unickling of Element work?

Yes, I'll look into that.

skrah · 2013-01-04T10:09:41Z

Daniel Shahaf <report@bugs.python.org> wrote:

Is there a requirement that it loads a particular module? Would etree
users notice the difference if pickle.load() returns an instance of the
"other" Element implementation than the one they pickled?

Yes: If you send an "_elementtree.Element" pickle to another machine that
doesn't have a working C version, it can't be unpickled. As far as I know
the only way around that is to pickle as "xml.etree.ElementTree.Element".

elibendersky · 2013-01-04T14:59:18Z

On Jan 4, 2013 2:09 AM, "Stefan Krah" <report@bugs.python.org> wrote:

Stefan Krah added the comment:

Daniel Shahaf <report@bugs.python.org> wrote:
> Is there a requirement that it loads a particular module? Would etree
> users notice the difference if pickle.load() returns an instance of the
> "other" Element implementation than the one they pickled?

Yes: If you send an "_elementtree.Element" pickle to another machine that
doesn't have a working C version, it can't be unpickled. As far as I know
the only way around that is to pickle as "xml.etree.ElementTree.Element".

----------

Yes this is a good point. Another thing to consider is that if both report
the same name then it will be possible to unpickle on a machine running 3.2

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue16076\>

danielsh · 2013-01-07T04:58:09Z

Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:

If this change is required (even if we choose to name it
"xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
regression, we may find ourselves in a need to change it between 3.3 and
3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
resolved conclusively.

Danial, could you investigate if such a change is absolutely required to
make pickling/unickling of Element work?

There are a few options:

Change c-Element's tp_name to "xml.etree.ElementTree.Element".
Register a reduce function for c-Element's that serialises them by
constructing an equivalent py-Element and returning the latter's
.__dict__ via the third return value:
http://docs.python.org/3/library/copyreg#copyreg.pickle
Add an entry mapping "builtins.Element" to
"xml.etree.ElementTree.Element" to _compat_pickle.IMPORT_MAPPING
(which is used by Lib/pickle.py:_Pickler.find_class()).

I haven't tried to implement either of these approaches.

elibendersky · 2013-01-07T05:16:23Z

I think that changing the _elementtree's Element's name to match the Python version, and then making sure the same serialized object is returned - is a worthy option to try.

Then it will hopefully "just work". When pickle deserializes a user-defined object that says it's a xml.etree.ElementTree.Element, it will try to import Element from xml.etree.ElementTree and should get the accelerated class. On machines without acceleration it will get the Python class.

danielsh · 2013-01-08T05:30:34Z

In the meantime, unrelated question: should TreeBuilder be pickleable too, and interchangeably (pickle|unpickle)able between the Python and C implementations? Asking because the Python TreeBuilder has a _factory member which it initializes to 'Element', and I'm not quite sure how cross-pickling should handle that.

danielsh · 2013-01-08T06:31:45Z

New iteration.

Open issues:

Share code with the init method. The issue with sharing code with either element_init() or create_new_element() would be malloc+realloc: unlike either of these methods, we know both the attributes and the number of children at allocation time, so we can allocate directly the right number of children.
C<->Py Interchangeable pickling of TreeBuilder (per above msg).

Differences to previous version:

Skip C<->Py interchangeability testing of TreeBuilder --- because that one started failing with:
_pickle.UnpicklingError: state is not a dictionary
It can probably be fixed, but I'd like to address the above question about pickling the factory first.
Use __getstate__ rather than __reduce__ for Element.
Make Element pickling interchangeable between c/py Element
(set tp_name to "xml.etree.ElementTree.Element" and match the
pickled format).

elibendersky · 2013-01-08T14:39:44Z

Daniel, thanks for this patch, it looks very good. I had some comments in the code review.

As for TreeBuilder, let's split it into a separate issue - I think it's much less critical. We can discuss it later once we're done with Element. If anything, it's more important to handle ElementTree itself first - it should be very easy (much simpler than Element).

With the factory method I'm not sure how the Py pickling happens. I need to dig a bit in pickling for that, because I don't know how method reference pickling works.

elibendersky · 2013-01-08T15:00:42Z

P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
print(pickle.dumps(tb))

Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed

danielsh · 2013-01-08T22:51:40Z

Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:

Eli Bendersky added the comment:

P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
print(pickle.dumps(tb))

Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed

Is that with or without the patch?

elibendersky · 2013-01-08T23:13:58Z

On Tue, Jan 8, 2013 at 2:51 PM, Daniel Shahaf <report@bugs.python.org>wrote:

Daniel Shahaf added the comment:

Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:
>
> Eli Bendersky added the comment:
>
> P = import_fresh_module('xml.etree.ElementTree',
blocked=['_elementtree'])
> tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
> print(pickle.dumps(tb))
>
> Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute
lookup builtins.function failed

Is that with or without the patch?

Without. Pickle can't handle functions that are not top-level, so bound
methods, internal functions, lambdas are all out. So pickling probably
wasn't a first priority for the Python version of TreeBuilder as well...

danielsh · 2013-01-08T23:24:44Z

Dissociating TreeBuilder from this issue per recent comments.

danielsh · 2013-01-08T23:48:34Z

v8 removes TreeBuilder handling (code + tests) and removes memcpy per review. (Outstanding issues in the review can be fixed post-v8, if needed.)

python-dev · 2013-01-10T14:07:33Z

New changeset 8d6dadfecf22 by Eli Bendersky in branch '3.3':
Issue bpo-16076: make _elementtree.Element pickle-able in a way that is compatible
http://hg.python.org/cpython/rev/8d6dadfecf22

New changeset 4c268b7c86e6 by Eli Bendersky in branch 'default':
Issue bpo-16076: make _elementtree.Element pickle-able in a way that is compatible
http://hg.python.org/cpython/rev/4c268b7c86e6

elibendersky · 2013-01-10T14:08:46Z

Fixed in 3.3 and default. Thanks for the good work, Daniel. A separate issue can be opened for TreeBuilder.

python-dev · 2013-01-10T14:36:27Z

New changeset c46054b49b6c by Eli Bendersky in branch '3.3':
Update Misc/NEWS for issue bpo-16076
http://hg.python.org/cpython/rev/c46054b49b6c

ezio-melotti · 2013-01-11T07:37:32Z

The fix introduced some refleaks:
$ ./python -m test -R3:2 test_xml_etree_c
test_xml_etree_c leaked [56, 56] references, sum=112

One seems to be in __getstate__:
diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c
--- a/Modules/_elementtree.c
+++ b/Modules/_elementtree.c
@@ -859,8 +859,10 @@
                                      PICKLED_ATTRIB, self->extra->attrib,
                                      PICKLED_TEXT, self->text,
                                      PICKLED_TAIL, self->tail);
-    if (instancedict)
+    if (instancedict) {
+        Py_DECREF(children);
         return instancedict;
+    }
     else {
         for (i = 0; i < PyList_GET_SIZE(children); i++)
             Py_DECREF(PyList_GET_ITEM(children, i));

I'm still looking for the other.

ezio-melotti · 2013-01-11T08:44:42Z

This is what I found out.
I used an easily copy/pastable one-liner that creates 3 variables: e (no children), e2 (3 children), e3 (5 children).

Original leaky code (test_xml_etree_c leaked [56, 56] references, sum=112):
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### e has no children and leaks 1 ref:
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76791 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76792 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76793 refs]
>>> ### e2 has 3 children and leaks 4 refs:
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76798 refs]
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76802 refs]

The leaked refs seems to be 1 for the children list + 1 for each children.
The diff I pasted in the previous *seems* to fix this (i.e. leaks in __gestate__ are gone, tests pass), but I had it crash once (couldn't reproduce after that, so it might be unrelated*), and I'm not sure it's correct.

With that patch applied we go down to test_xml_etree_c leaked [6, 6] references, sum=12.
The remaining leaks seem to be in __setstate__.
Patched code:
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### no more leaks for getstate:
>>> p = e.__getstate__()
[76787 refs]
>>> p = e.__getstate__()
[76787 refs]
>>> ### also no leaks when there are no child:
>>> e.__setstate__(p)
[76788 refs]
>>> e.__setstate__(p)
[76788 refs]
>>> ### no more leaks for getstate with children:
>>> p2 = e2.__getstate__()
[76807 refs]
>>> p2 = e2.__getstate__()
[76807 refs]
>>> ### one ref leaked for every child in __setstate__:
>>> e2.__setstate__(p2)
[76810 refs]
>>> e2.__setstate__(p2)
[76813 refs]
>>> e2.__setstate__(p2)
[76816 refs]

I'm not working on this anymore now, so someone more familiar with the code can take a look, see if my patch is correct, and fix the remaining leaks.

maybe I'm doing something wrong, but ISTM that make -j2 doesn't always work right away, and sometimes I get different results if I do it again without touching the code.

danielsh · 2013-01-11T11:07:02Z

Ezio Melotti wrote on Fri, Jan 11, 2013 at 08:44:43 +0000:
> >>> ### one ref leaked for every child in __setstate__:
> >>> e2.__setstate__(p2)
> [76810 refs]
> >>> e2.__setstate__(p2)
> [76813 refs]
> >>> e2.__setstate__(p2)
> [76816 refs]
> 
> I'm not working on this anymore now, so someone more familiar with the
> code can take a look, see if my patch is correct, and fix the
> remaining leaks.
>

Thank for narrowing it down so much. Attached patch incorporates your
getstate fix and a fix for setstate. It passes the -R 3:2 tests.

python-dev · 2013-01-12T13:21:34Z

New changeset 5b36768b9a11 by Eli Bendersky in branch '3.3':
Issue bpo-16076: fix refleak in pickling of Element.
http://hg.python.org/cpython/rev/5b36768b9a11

New changeset 848738d3c40f by Eli Bendersky in branch 'default':
Close bpo-16076: fix refleak in pickling of Element.
http://hg.python.org/cpython/rev/848738d3c40f

python-dev · 2013-01-12T13:43:32Z

New changeset 4501813ea676 by Eli Bendersky in branch '3.3':
Issue bpo-16076: check for return value of PyTuple_New for args (following
http://hg.python.org/cpython/rev/4501813ea676

New changeset 7313096e0bad by Eli Bendersky in branch 'default':
Issue bpo-16076: check for return value of PyTuple_New for args (following
http://hg.python.org/cpython/rev/7313096e0bad

einarfd mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 28, 2012

serhiy-storchaka added the topic-XML label Sep 28, 2012

einarfd mannequin removed the stdlib Python modules in the Lib dir label Sep 28, 2012

santosowijaya mannequin added the stdlib Python modules in the Lib dir label Oct 11, 2012

danielsh mannequin added the extension-modules C modules in the Modules dir label Dec 3, 2012

danielsh mannequin changed the title ~~xml.etree.ElementTree.Element is no longer pickleable~~ xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable Dec 7, 2012

Arfrever mannequin added the release-blocker label Dec 7, 2012

danielsh mannequin changed the title ~~xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable~~ xml.etree.ElementTree.Element is no longer pickleable Jan 8, 2013

elibendersky mannequin closed this as completed Jan 10, 2013

ezio-melotti reopened this Jan 11, 2013

python-dev mannequin closed this as completed Jan 12, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Navigation Menu

xml.etree.ElementTree.Element is no longer pickleable #60280

xml.etree.ElementTree.Element is no longer pickleable #60280

Comments

einarfd mannequin commented Sep 28, 2012

einarfd mannequin commented Sep 28, 2012

serhiy-storchaka commented Oct 1, 2012

elibendersky mannequin commented Oct 7, 2012

santosowijaya mannequin commented Oct 11, 2012

santosowijaya mannequin commented Oct 11, 2012

danielsh mannequin commented Dec 3, 2012

danielsh mannequin commented Dec 7, 2012

danielsh mannequin commented Dec 7, 2012

elibendersky mannequin commented Dec 8, 2012

danielsh mannequin commented Dec 8, 2012

danielsh mannequin commented Dec 16, 2012

danielsh mannequin commented Dec 19, 2012

elibendersky mannequin commented Dec 29, 2012

danielsh mannequin commented Dec 29, 2012

elibendersky mannequin commented Dec 29, 2012

danielsh mannequin commented Dec 29, 2012

elibendersky mannequin commented Dec 30, 2012

python-dev mannequin commented Dec 30, 2012

elibendersky mannequin commented Dec 30, 2012

danielsh mannequin commented Dec 30, 2012

danielsh mannequin commented Dec 31, 2012

elibendersky mannequin commented Jan 1, 2013

danielsh mannequin commented Jan 1, 2013

danielsh mannequin commented Jan 1, 2013

danielsh mannequin commented Jan 1, 2013

elibendersky mannequin commented Jan 3, 2013

danielsh mannequin commented Jan 4, 2013

skrah mannequin commented Jan 4, 2013

elibendersky mannequin commented Jan 4, 2013

danielsh mannequin commented Jan 7, 2013

elibendersky mannequin commented Jan 7, 2013

danielsh mannequin commented Jan 8, 2013

danielsh mannequin commented Jan 8, 2013

elibendersky mannequin commented Jan 8, 2013

elibendersky mannequin commented Jan 8, 2013

danielsh mannequin commented Jan 8, 2013

elibendersky mannequin commented Jan 8, 2013

danielsh mannequin commented Jan 8, 2013

danielsh mannequin commented Jan 8, 2013

python-dev mannequin commented Jan 10, 2013

elibendersky mannequin commented Jan 10, 2013

python-dev mannequin commented Jan 10, 2013

ezio-melotti commented Jan 11, 2013

ezio-melotti commented Jan 11, 2013

danielsh mannequin commented Jan 11, 2013

python-dev mannequin commented Jan 12, 2013

python-dev mannequin commented Jan 12, 2013