classification
Title: xml.etree.ElementTree.Element is no longer pickleable
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib), XML Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, danielsh, einarfd, eli.bendersky, ezio.melotti, flox, georg.brandl, jcea, larry, python-dev, santoso.wijaya, skrah
Priority: release blocker Keywords: 3.3regression, patch

Created on 2012-09-28 08:54 by einarfd, last changed 2013-01-12 13:43 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
i16076-cpatch.diff danielsh, 2012-12-03 16:48 review
i16076-v3-combined.diff danielsh, 2012-12-07 22:36 review
i16076-v4-combined.diff danielsh, 2012-12-08 18:50 review
i16076-v5.diff danielsh, 2012-12-19 17:19 review
transcript-test___all__.txt danielsh, 2012-12-29 16:18 review
i16076-v6.diff danielsh, 2012-12-30 23:57 review
i16076-v7.diff danielsh, 2013-01-08 06:31 review
i16076-v8.diff danielsh, 2013-01-08 23:48 review
memleak-v1.diff danielsh, 2013-01-11 11:07 review
Messages (50)
msg171418 - (view) Author: Einar Fløystad Dørum (einarfd) Date: 2012-09-28 08:54
The xml.etree.ElementTree.Element class is no longer pickleable in Python 3.3.0rc3 . 

This is a regression from Python 3.2 where the Element class was pickleable, while the xml.etree.cElementTree.Element was not. So this is probably related to switching the ElementTree implementation in Python 3.3.

I've looked at the "what's new" documentation in Python 3.3, and there was nothing that indicated that the ElemenTree switchover would give this kind of issues.  

I've run into this issues because I use the multiprocessing module to pass parsed XML documents between processes, and when I can't pickle Element objects. This stops working. 

You can reproduce the issue with the following code:
import pickle
from xml.etree.ElementTree import Element

print(pickle.dumps(Element("foo")))
msg171737 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-01 17:52
In 3.2 repr(xml.etree.ElementTree.Element) is "<class 'xml.etree.ElementTree.Element'>".
In 3.3 repr(xml.etree.ElementTree.Element) is "<class 'Element'>".
msg172297 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-10-07 13:19
Will look into it, thanks for the report.
msg172613 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2012-10-11 01:37
This is because, in Python 3.3, xml.etree.ElementTree.Element is overridden by the C accelerator version, _elementtree.Element.

If you remove the statement, 

1708    from _elementtree import *

the behavior from Python 3.2 comes back.
msg172665 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2012-10-11 17:07
OTOH, xml.etree.cElementTree.Element in Python 3.2 and earlier has never been pickleable, either.

Python 3.2.3+ (3.2:24499eebbc2f, Oct 10 2012, 13:54:45) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.cElementTree import Element
>>> repr(Element)
'<built-in function Element>'
>>> import pickle
>>> pickle.dumps(Element('foo'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed
msg176854 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-03 16:48
Attached patch for your consideration.

I've tested pickling/unpickling and comparing the resulting object attribute by attribute (.tag, .attrib, .text, .tail for equality; and recursively .getchildren()), and 'make test' --- all seems to work.

If the approach is sound, I can submit a revised patch that also updates documentation, regression tests, and potententially updates other type objects in the same manner as this patch updates _elementtree.Element .
msg177134 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-07 22:23
TreeBuilder was also pickleable in 3.2 but now isn't so; updating summary accordingly.

Attaching a checkpoint patch.  It addresses both Element and TreeBuilder, and adds tests.  The added tests fail if test___all__ is included in the test run, so the patch should not be applied as-is.

Thanks to Ezio Melotti for preliminary review and suggestions on IRC.
msg177136 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-07 22:36
Reattaching without the unrelated Python-ast.c change.
msg177141 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-08 03:15
Thanks for working on it, Daniel. Unfortunately I won't have time to look at it in the near future, but I will definitely look at the patch once I get some free time to hack on Python again.
msg177170 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-08 18:50
@Eli, thanks.  In the meantime, attaching a patch addressing Ezio's review; the caveats from msg 177134 still apply.
msg177590 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-16 11:58
Haven't had a chance to look at the test___all__ -related failures yet.  Still planning to do so once I have some time.
msg177766 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-19 17:19
Re the problem from msg 177134 ('./python -mtest test___all__ test_xml_etree' failing), it is a a preexisting problem, in the sense that applying just the Lib/test/ part of this patch suffices to trigger it.  Adding 'xml.etree' to the blacklist in test___all__.py:50 suffices to address the problem.  At a guess that is because when test___all__ imports the 'xml.etree' module, it doesn't filter out the '_elementtree' symbol like test_xml_etree.py:test_main() does.

New patch attached; differences to previous version:

- Removed debug prints.
- Skip test_pickling when testing the Python xml.etree.
  (The skips could be removed in the future, if/when the test___all__
  interaction problem is resolved.)
msg178498 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-29 15:00
Daniel, is your patch made vs. the 3.3 branch? I'll need to apply there first, and then merge up to default (3.4).

[Also, removing the 3.2 tag here. 3.2 won't be fixed to make _elementtree pickleable - it never was].
msg178499 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-29 15:04
I wrote the patch against default (3.4), but it applies cleanly to 3.3.
msg178500 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-29 15:10
Also, could you explain what makes test___all__ start failing with this patch? What are you adding that makes that happen?

P.S. I suspect the root reason is the bad way etree tests are structured in general. See issue 15083
msg178509 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-29 16:18
Any attempt to pickle an Element object in test_xml_etree.py causes that test to fail if test___all__ had been run before it; see attached transcript.  It's against 3.4, with no changes other than the testsuite changes shown within.

I don't know what the root cause is.  As noted before, adding 'xml.etree' to test___all__.AllTest.test_all.blacklist is a workaround, so I assume the root cause lies within the test framework --- a bad interaction between the import magics in test___all__ and test_xml_etree.
msg178546 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-30 00:18
I think I understand what's going on there. Pickle, to make sure an object can be picked, looks at the module it comes from and tries to import its class, to make sure they're the same.

What happens since 3.3 is that for the Python Element class, it imports ElementTree. The harness in test_xml_etree carefully sets up to ignore _elementtree so the correct class is pulled. However, if test___all__ runs before it it just does a brute "from ElementTree import *" which places the C version in sys.modules, and this is what pickle finds.

So in the meantime, until issue 15083 is resolved I think it's safe to put ET in the ignore list of test___all__.
msg178578 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-30 14:18
New changeset 71508fc738bb by Eli Bendersky in branch '3.3':
For Issue #16076: make sure that pickling of Element objects is tested, and do
http://hg.python.org/cpython/rev/71508fc738bb

New changeset 5a38f4d7833c by Eli Bendersky in branch 'default':
For issue #16076: merge 3.3
http://hg.python.org/cpython/rev/5a38f4d7833c
msg178579 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-12-30 14:24
I've added some (currently pyET specific) pickling tests. Daniel, could you re-generate the patch? Note that for the C version of pickling you can now enable the pickle test. Point to consider - can elements pickled in C be unpickled in Python and vice versa? It should probably work (see test_decimal's pickling tests).
msg178627 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-30 23:57
Attached.  Differences to previous version:

- Avoid the test___all__ issue (#16817) by manipulating sys.modules
  directly

- Support pickling interoperability between the C and Python
  implementations
msg178632 - (view) Author: Daniel Shahaf (danielsh) Date: 2012-12-31 00:54
BTW, maybe I'm just confused, but I'm a little surprised that C pickling -> Python unpickling works: the C __reduce__ returns a 3-element tuple, which according to pickle docs[1] must either be a dictionary or be passed to __setstate__; it's not a dictionary and the Python implementation doesn't have a __setstate__.

http://docs.python.org/3/library/pickle#pickle.object.__reduce__
msg178721 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-01 00:32
A couple of high-level questions about the C code (this isn't a detailed review yet):

1. Why did you choose to implement __reduce__ and not __getstate__?
2. A lot of code appears to be shared between element_setstate_from_attributes and the init method implementation, can it be refactored?
3. I've been thinking about the C<->Python pickle compatibility, and it may be too much trouble at this point for no very strong benefit. What could make more sense at some point is implementing a __getstate__/__setstate__ in both the Python and C versions that are compatible (i.e. produce the same things). But this may mean changing the way the Python Element is pickled in 3.3, which may not be acceptable. So we may prefer to forego the compatibility in 3.3 and then for 3.4 make the change. I'm still not sure what is best here.
msg178736 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-01 15:53
Other thoughts.

I'm not sure why you're surprised the C->Python pickle/unpickle works. You've changed the type name from Element to _elementtree.Element, so I would guess Python always uses the C version to unpickle as well. Can you debug to verify what actually goes on under the hood? Why did you change the class name, by the way, I don't think it's a valid change at least for 3.3 in terms of backwards compatibility.

Regarding that compatibility, and even easier idea would be for the C pickle to return the same __dict__ implicitly gathered from the Python version, and then only one version of the unpickle is required.
msg178745 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-01 18:42
Yes, currently the C version is also used for unpickling. Actually
this problem was one of the reasons why _decimal sets its name to
"decimal".
 


from test.support import import_fresh_module
import pickle, sys

C = import_fresh_module('xml.etree.ElementTree', fresh=['_elementtree'])
P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
e = C.Element('foo', bar=42)
e.text = "text goes here"
e.tail = "opposite of head"
C.SubElement(e, 'child').append(C.Element('grandchild'))
e.append(C.Element('child'))
e.findall('.//grandchild')[0].set('attr', 'other value')
sys.modules['xml.etree.ElementTree'] = C 
s = pickle.dumps(e)
s

b'\x80\x03c_elementtree\nElement\nq\x00X\x03\x00\x00\x00fooq\x01}q\x02X\x03\x00\x00\x00barq\x03K*s\x86q\x04Rq\x05X
\x0e\x00\x00\x00text goes hereq\x06X\x10\x00\x00\x00opposite of headq\x07h\x00X\x05\x00\x00\x00childq\x08\x85q\tRq
\nNNh\x00X\n\x00\x00\x00grandchildq\x0b}q\x0cX\x04\x00\x00\x00attrq\rX\x0b\x00\x00\x00other valueq\x0es\x86q\x0fRq
\x10NNN\x87q\x11b\x85q\x12\x87q\x13bh\x00h\x08\x85q\x14Rq\x15NNN\x87q\x16b\x86q\x17\x87q\x18b.'

sys.modules['xml.etree.ElementTree'] = P
x = pickle.loads(s)
type(x)
<class '_elementtree.Element'>
msg178770 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-01 22:40
Thank you for the input Stefan. I was actually glancing at _decimal as an example of implementing pickling and inter-compatibility between the C and Py versions of pickles.

You've chosen compatibility by having the same class name and __reduce__ returning the exact same tuple for both implementation. This is a pretty good idea.

Unfortunately, in the case of Element it's more difficult, because the existing Py implementation does not have __reduce__, so pickle does its thing by looking at __dict__. Changing the Py version to have a __reduce__ seems risky as it may break compatibility between the different Py versions of Element (say, from 3.2) and this is bad.
msg178776 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-01 22:56
Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:
> Why did you change the class name, by the way, I don't think it's
> a valid change at least for 3.3 in terms of backwards compatibility.
> 

With unmodified tip of 3.4:

    >>> import pickle, xml.etree.ElementTree as ET
    >>> pickle.dumps(ET.Element('foo'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed

I added the "_elementtree" to the tp_name in order to bypass the above
error.  Module-qualified names were in use elsewhere (including by
_elementtree._element_iterator) so it seemed reasonable.  I'll defer to
you about compatibility implications of this change.

> Regarding that compatibility, and even easier idea would be for the
> C pickle to return the same __dict__ implicitly gathered from the
> Python version, and then only one version of the unpickle is required.

That makes sense.  But going forward it might be even better to define
an explicit __reduce__/__getstate__ for the Python version too, so if
the instance dict grows new members, they won't get serialised
unintentionally.  (This consideration is also why C code in the latest
patch makes some effort to notice unknown args in the dict.)
msg178783 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-01 23:11
Eli Bendersky wrote on Tue, Jan 01, 2013 at 00:32:51 +0000:
> 1. Why did you choose to implement __reduce__ and not __getstate__?

Maybe I was simply imitating what some other extension module was doing ;)

By using __reduce__ with the type as first return value, the
__setstate__ code becomes simpler because it doesn't need to address
'tag' and 'attrib'.  This consideration is somewhat moot now that the
unpickle-Python-pickled-Element's code path has to think about all
instance attributes, including those settable by the constructor.

> 2. A lot of code appears to be shared between
> element_setstate_from_attributes and the init method implementation,
> can it be refactored?

It seems there might be room for code reuse --- both functions set
tag,text,tail,attrib (but the unpickler sets the children too).  I'll
look into that in the next iteration (once the class name and pickle
output format issues are settled).
msg178788 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-01 23:43
Also, the class inheritance in the tests should be amended to follow #16835.
msg178946 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-03 14:44
On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:

>
> Daniel Shahaf added the comment:
>
> Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:
> > Why did you change the class name, by the way, I don't think it's
> > a valid change at least for 3.3 in terms of backwards compatibility.
> >
>
> With unmodified tip of 3.4:
>
>     >>> import pickle, xml.etree.ElementTree as ET
>     >>> pickle.dumps(ET.Element('foo'))
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     _pickle.PicklingError: Can't pickle <class 'Element'>: attribute
> lookup builtins.Element failed
>
> I added the "_elementtree" to the tp_name in order to bypass the above
> error.  Module-qualified names were in use elsewhere (including by
> _elementtree._element_iterator) so it seemed reasonable.  I'll defer to
> you about compatibility implications of this change.

I asked on pydev, but this is a key point to resolve. Can
pickling/unpickling be made to work correctly without such (or similar)
change at all? How will the unpickler know which module to load when it
sees a pickled object of Element type?

If this change is required (even if we choose to name it
"xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
regression, we may find ourselves in a need to change it between 3.3 and
3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
resolved conclusively.

Danial, could you investigate if such a change is absolutely required to
make pickling/unickling of Element work?
msg179002 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-04 02:09
Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:
> On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:
> > I added the "_elementtree" to the tp_name in order to bypass the above
> > error.  Module-qualified names were in use elsewhere (including by
> > _elementtree._element_iterator) so it seemed reasonable.  I'll defer to
> > you about compatibility implications of this change.
> 
> I asked on pydev, but this is a key point to resolve. Can
> pickling/unpickling be made to work correctly without such (or similar)
> change at all? How will the unpickler know which module to load when it
> sees a pickled object of Element type?

Is there a requirement that it loads a particular module?  Would etree
users notice the difference if pickle.load() returns an instance of the
"other" Element implementation than the one they pickled?

> Danial, could you investigate if such a change is absolutely required to
> make pickling/unickling of Element work?

Yes, I'll look into that.
msg179011 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-04 10:09
Daniel Shahaf <report@bugs.python.org> wrote:
> Is there a requirement that it loads a particular module?  Would etree
> users notice the difference if pickle.load() returns an instance of the
> "other" Element implementation than the one they pickled?

Yes: If you send an "_elementtree.Element" pickle to another machine that
doesn't have a working C version, it can't be unpickled. As far as I know
the only way around that is to pickle as "xml.etree.ElementTree.Element".
msg179032 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-04 14:59
On Jan 4, 2013 2:09 AM, "Stefan Krah" <report@bugs.python.org> wrote:
>
>
> Stefan Krah added the comment:
>
> Daniel Shahaf <report@bugs.python.org> wrote:
> > Is there a requirement that it loads a particular module?  Would etree
> > users notice the difference if pickle.load() returns an instance of the
> > "other" Element implementation than the one they pickled?
>
> Yes: If you send an "_elementtree.Element" pickle to another machine that
> doesn't have a working C version, it can't be unpickled. As far as I know
> the only way around that is to pickle as "xml.etree.ElementTree.Element".
>
> ----------
>

Yes this is a good point. Another thing to consider is that if both report
the same name then it will be possible to unpickle on a machine running 3.2

> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16076>
> _______________________________________
msg179243 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-07 04:58
Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:
> If this change is required (even if we choose to name it
> "xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
> regression, we may find ourselves in a need to change it between 3.3 and
> 3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
> resolved conclusively.
> 
> Danial, could you investigate if such a change is absolutely required to
> make pickling/unickling of Element work?

There are a few options:

- Change c-Element's tp_name to "xml.etree.ElementTree.Element".

- Register a reduce function for c-Element's that serialises them by
  constructing an equivalent py-Element and returning the latter's
  .__dict__ via the third return value:
  http://docs.python.org/3/library/copyreg#copyreg.pickle

- Add an entry mapping "builtins.Element" to
  "xml.etree.ElementTree.Element" to _compat_pickle.IMPORT_MAPPING
  (which is used by Lib/pickle.py:_Pickler.find_class()).

I haven't tried to implement either of these approaches.
msg179244 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-07 05:16
I think that changing the _elementtree's Element's name to match the Python version, and then making sure the same serialized object is returned - is a worthy option to try.

Then it will hopefully "just work". When pickle deserializes a user-defined object that says it's a xml.etree.ElementTree.Element, it will try to import Element from xml.etree.ElementTree and should get the accelerated class. On machines without acceleration it will get the Python class.
msg179314 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-08 05:30
In the meantime, unrelated question: should TreeBuilder be pickleable too, and interchangeably (pickle|unpickle)able between the Python and C implementations?  Asking because the Python TreeBuilder has a _factory member which it initializes to 'Element', and I'm not quite sure how cross-pickling should handle that.
msg179315 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-08 06:31
New iteration.

Open issues:

- Share code with the init method.  The issue with sharing code with either element_init() or create_new_element() would be malloc+realloc: unlike either of these methods, we know both the attributes and the number of children at allocation time, so we can allocate directly the right number of children.

- C<->Py Interchangeable pickling of TreeBuilder (per above msg).

Differences to previous version:

- Skip C<->Py interchangeability testing of TreeBuilder --- because that one started failing with:
  _pickle.UnpicklingError: state is not a dictionary
  It can probably be fixed, but I'd like to address the above question about pickling the factory first.

- Use __getstate__ rather than __reduce__ for Element.

- Make Element pickling interchangeable between c/py Element
  (set tp_name to "xml.etree.ElementTree.Element" and match the
  pickled format).
msg179354 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-08 14:39
Daniel, thanks for this patch, it looks very good. I had some comments in the code review.

As for TreeBuilder, let's split it into a separate issue - I think it's much less critical. We can discuss it later once we're done with Element. If anything, it's more important to handle ElementTree itself first - it should be very easy (much simpler than Element).

With the factory method I'm not sure how the Py pickling happens. I need to dig a bit in pickling for that, because I don't know how method reference pickling works.
msg179356 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-08 15:00
P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
print(pickle.dumps(tb))

Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
msg179386 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-08 22:51
Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:
> 
> Eli Bendersky added the comment:
> 
> P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
> tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
> print(pickle.dumps(tb))
> 
> Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed

Is that with or without the patch?
msg179393 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-08 23:13
On Tue, Jan 8, 2013 at 2:51 PM, Daniel Shahaf <report@bugs.python.org>wrote:

>
> Daniel Shahaf added the comment:
>
> Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:
> >
> > Eli Bendersky added the comment:
> >
> > P = import_fresh_module('xml.etree.ElementTree',
> blocked=['_elementtree'])
> > tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
> > print(pickle.dumps(tb))
> >
> > Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute
> lookup builtins.function failed
>
> Is that with or without the patch?
>

Without. Pickle can't handle functions that are not top-level, so bound
methods, internal functions, lambdas are all out. So pickling probably
wasn't a first priority for the Python version of TreeBuilder as well...
msg179395 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-08 23:24
Dissociating TreeBuilder from this issue per recent comments.
msg179400 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-08 23:48
v8 removes TreeBuilder handling (code + tests) and removes memcpy per review.  (Outstanding issues in the review can be fixed post-v8, if needed.)
msg179544 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-10 14:07
New changeset 8d6dadfecf22 by Eli Bendersky in branch '3.3':
Issue #16076: make _elementtree.Element pickle-able in a way that is compatible
http://hg.python.org/cpython/rev/8d6dadfecf22

New changeset 4c268b7c86e6 by Eli Bendersky in branch 'default':
Issue #16076: make _elementtree.Element pickle-able in a way that is compatible
http://hg.python.org/cpython/rev/4c268b7c86e6
msg179546 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-10 14:08
Fixed in 3.3 and default. Thanks for the good work, Daniel. A separate issue can be opened for TreeBuilder.
msg179553 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-10 14:36
New changeset c46054b49b6c by Eli Bendersky in branch '3.3':
Update Misc/NEWS for issue #16076
http://hg.python.org/cpython/rev/c46054b49b6c
msg179642 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-01-11 07:37
The fix introduced some refleaks: 
$ ./python -m test -R3:2 test_xml_etree_c
test_xml_etree_c leaked [56, 56] references, sum=112

One seems to be in __getstate__:
diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c
--- a/Modules/_elementtree.c
+++ b/Modules/_elementtree.c
@@ -859,8 +859,10 @@
                                      PICKLED_ATTRIB, self->extra->attrib,
                                      PICKLED_TEXT, self->text,
                                      PICKLED_TAIL, self->tail);
-    if (instancedict)
+    if (instancedict) {
+        Py_DECREF(children);
         return instancedict;
+    }
     else {
         for (i = 0; i < PyList_GET_SIZE(children); i++)
             Py_DECREF(PyList_GET_ITEM(children, i));

I'm still looking for the other.
msg179651 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-01-11 08:44
This is what I found out.
I used an easily copy/pastable one-liner that creates 3 variables: e (no children), e2 (3 children), e3 (5 children).

Original leaky code (test_xml_etree_c leaked [56, 56] references, sum=112):
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### e has no children and leaks 1 ref:
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76791 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76792 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76793 refs]
>>> ### e2 has 3 children and leaks 4 refs:
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76798 refs]
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76802 refs]

The leaked refs seems to be 1 for the children list + 1 for each children.
The diff I pasted in the previous *seems* to fix this (i.e. leaks in __gestate__ are gone, tests pass), but I had it crash once (couldn't reproduce after that, so it might be unrelated*), and I'm not sure it's correct.

With that patch applied we go down to test_xml_etree_c leaked [6, 6] references, sum=12.
The remaining leaks seem to be in __setstate__.
Patched code:
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### no more leaks for getstate:
>>> p = e.__getstate__()
[76787 refs]
>>> p = e.__getstate__()
[76787 refs]
>>> ### also no leaks when there are no child:
>>> e.__setstate__(p)
[76788 refs]
>>> e.__setstate__(p)
[76788 refs]
>>> ### no more leaks for getstate with children:
>>> p2 = e2.__getstate__()
[76807 refs]
>>> p2 = e2.__getstate__()
[76807 refs]
>>> ### one ref leaked for every child in __setstate__:
>>> e2.__setstate__(p2)
[76810 refs]
>>> e2.__setstate__(p2)
[76813 refs]
>>> e2.__setstate__(p2)
[76816 refs]

I'm not working on this anymore now, so someone more familiar with the code can take a look, see if my patch is correct, and fix the remaining leaks.

* maybe I'm doing something wrong, but ISTM that ``make -j2`` doesn't always work right away, and sometimes I get different results if I do it again without touching the code.
msg179667 - (view) Author: Daniel Shahaf (danielsh) Date: 2013-01-11 11:07
Ezio Melotti wrote on Fri, Jan 11, 2013 at 08:44:43 +0000:
> >>> ### one ref leaked for every child in __setstate__:
> >>> e2.__setstate__(p2)
> [76810 refs]
> >>> e2.__setstate__(p2)
> [76813 refs]
> >>> e2.__setstate__(p2)
> [76816 refs]
> 
> I'm not working on this anymore now, so someone more familiar with the
> code can take a look, see if my patch is correct, and fix the
> remaining leaks.
> 

Thank for narrowing it down so much.  Attached patch incorporates your
getstate fix and a fix for setstate.  It passes the -R 3:2 tests.
msg179790 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-12 13:21
New changeset 5b36768b9a11 by Eli Bendersky in branch '3.3':
Issue #16076: fix refleak in pickling of Element.
http://hg.python.org/cpython/rev/5b36768b9a11

New changeset 848738d3c40f by Eli Bendersky in branch 'default':
Close #16076: fix refleak in pickling of Element.
http://hg.python.org/cpython/rev/848738d3c40f
msg179792 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-12 13:43
New changeset 4501813ea676 by Eli Bendersky in branch '3.3':
Issue #16076: check for return value of PyTuple_New for args (following
http://hg.python.org/cpython/rev/4501813ea676

New changeset 7313096e0bad by Eli Bendersky in branch 'default':
Issue #16076: check for return value of PyTuple_New for args (following
http://hg.python.org/cpython/rev/7313096e0bad
History
Date User Action Args
2013-01-12 13:43:32python-devsetmessages: + msg179792
2013-01-12 13:21:35python-devsetstatus: open -> closed

messages: + msg179790
stage: needs patch -> resolved
2013-01-11 11:07:02danielshsetfiles: + memleak-v1.diff

messages: + msg179667
2013-01-11 08:44:43ezio.melottisetstatus: closed -> open

messages: + msg179651
stage: resolved -> needs patch
2013-01-11 07:37:32ezio.melottisetmessages: + msg179642
2013-01-10 14:36:27python-devsetmessages: + msg179553
2013-01-10 14:08:46eli.benderskysetstatus: open -> closed
resolution: fixed
messages: + msg179546

stage: needs patch -> resolved
2013-01-10 14:07:34python-devsetmessages: + msg179544
2013-01-08 23:48:35danielshsetfiles: + i16076-v8.diff

messages: + msg179400
2013-01-08 23:24:44danielshsetmessages: + msg179395
title: xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable -> xml.etree.ElementTree.Element is no longer pickleable
2013-01-08 23:13:59eli.benderskysetmessages: + msg179393
2013-01-08 22:51:40danielshsetmessages: + msg179386
2013-01-08 15:00:42eli.benderskysetmessages: + msg179356
2013-01-08 14:39:43eli.benderskysetmessages: + msg179354
2013-01-08 06:31:49danielshsetfiles: + i16076-v7.diff

messages: + msg179315
2013-01-08 05:30:34danielshsetmessages: + msg179314
2013-01-07 05:16:22eli.benderskysetmessages: + msg179244
2013-01-07 04:58:09danielshsetmessages: + msg179243
2013-01-04 14:59:18eli.benderskysetmessages: + msg179032
2013-01-04 10:09:41skrahsetmessages: + msg179011
2013-01-04 02:09:37danielshsetmessages: + msg179002
2013-01-03 14:44:02eli.benderskysetmessages: + msg178946
2013-01-01 23:43:54danielshsetmessages: + msg178788
2013-01-01 23:11:48danielshsetmessages: + msg178783
2013-01-01 22:56:48danielshsetmessages: + msg178776
2013-01-01 22:40:00eli.benderskysetmessages: + msg178770
2013-01-01 18:42:36skrahsetnosy: + skrah
messages: + msg178745
2013-01-01 15:53:59eli.benderskysetmessages: + msg178736
2013-01-01 00:32:50eli.benderskysetmessages: + msg178721
2012-12-31 00:54:58danielshsetmessages: + msg178632
2012-12-30 23:57:34danielshsetfiles: + i16076-v6.diff

messages: + msg178627
2012-12-30 14:24:17eli.benderskysetmessages: + msg178579
2012-12-30 14:18:33python-devsetnosy: + python-dev
messages: + msg178578
2012-12-30 00:30:55floxsetnosy: + flox
2012-12-30 00:18:33eli.benderskysetmessages: + msg178546
2012-12-29 16:18:13danielshsetfiles: + transcript-test___all__.txt

messages: + msg178509
2012-12-29 15:10:33eli.benderskysetmessages: + msg178500
2012-12-29 15:04:46danielshsetmessages: + msg178499
2012-12-29 15:00:01eli.benderskysetmessages: + msg178498
versions: - Python 3.2
2012-12-19 17:19:39danielshsetfiles: + i16076-v5.diff

messages: + msg177766
2012-12-16 11:58:39danielshsetmessages: + msg177590
2012-12-08 18:50:45danielshsetfiles: + i16076-v4-combined.diff

messages: + msg177170
2012-12-08 03:15:19eli.benderskysetmessages: + msg177141
2012-12-07 22:42:12Arfreversetpriority: high -> release blocker
nosy: + larry, georg.brandl
2012-12-07 22:40:15danielshsetfiles: - i16076-v2-combined.diff
2012-12-07 22:36:39danielshsetfiles: + i16076-v3-combined.diff

messages: + msg177136
2012-12-07 22:23:23danielshsetfiles: + i16076-v2-combined.diff

messages: + msg177134
title: xml.etree.ElementTree.Element is no longer pickleable -> xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable
2012-12-03 16:48:11danielshsetfiles: + i16076-cpatch.diff

nosy: + danielsh
messages: + msg176854

components: + Extension Modules
keywords: + patch
2012-11-01 20:44:12serhiy.storchakasetnosy: - serhiy.storchaka
2012-10-11 17:07:17santoso.wijayasetmessages: + msg172665
components: + Library (Lib)
versions: + Python 3.2
2012-10-11 01:37:46santoso.wijayasetnosy: + santoso.wijaya
messages: + msg172613
2012-10-07 13:19:26eli.benderskysetmessages: + msg172297
2012-10-03 02:35:07jceasetpriority: normal -> high
stage: needs patch
2012-10-03 02:34:30jceasetpriority: high -> normal
nosy: + jcea

stage: needs patch -> (no value)
2012-10-01 17:52:06serhiy.storchakasetmessages: + msg171737
versions: + Python 3.4
2012-10-01 16:38:41pitrousetpriority: normal -> high
stage: needs patch
2012-09-30 22:38:21Arfreversetnosy: + Arfrever
2012-09-30 16:46:21ezio.melottisetnosy: + ezio.melotti
2012-09-28 10:52:14einarfdsetcomponents: - Library (Lib)
2012-09-28 09:59:15serhiy.storchakasetnosy: + eli.bendersky, serhiy.storchaka
2012-09-28 09:58:32serhiy.storchakasetkeywords: + 3.3regression
components: + XML
2012-09-28 08:54:59einarfdcreate