Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree.Element is no longer pickleable #60280

Closed
einarfd mannequin opened this issue Sep 28, 2012 · 50 comments
Closed

xml.etree.ElementTree.Element is no longer pickleable #60280

einarfd mannequin opened this issue Sep 28, 2012 · 50 comments
Labels
extension-modules C modules in the Modules dir release-blocker stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@einarfd
Copy link
Mannequin

einarfd mannequin commented Sep 28, 2012

BPO 16076
Nosy @birkenfeld, @jcea, @larryhastings, @ezio-melotti, @skrah, @florentx
Files
  • i16076-cpatch.diff
  • i16076-v3-combined.diff
  • i16076-v4-combined.diff
  • i16076-v5.diff
  • transcript-test___all__.txt
  • i16076-v6.diff
  • i16076-v7.diff
  • i16076-v8.diff
  • memleak-v1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-01-12.13:21:35.399>
    created_at = <Date 2012-09-28.08:54:59.477>
    labels = ['extension-modules', 'expert-XML', 'type-bug', 'library', 'release-blocker']
    title = 'xml.etree.ElementTree.Element is no longer pickleable'
    updated_at = <Date 2013-01-12.13:43:31.998>
    user = 'https://bugs.python.org/einarfd'

    bugs.python.org fields:

    activity = <Date 2013-01-12.13:43:31.998>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-01-12.13:21:35.399>
    closer = 'python-dev'
    components = ['Extension Modules', 'Library (Lib)', 'XML']
    creation = <Date 2012-09-28.08:54:59.477>
    creator = 'einarfd'
    dependencies = []
    files = ['28199', '28252', '28261', '28363', '28484', '28502', '28627', '28639', '28691']
    hgrepos = []
    issue_num = 16076
    keywords = ['patch', '3.3regression']
    message_count = 50.0
    messages = ['171418', '171737', '172297', '172613', '172665', '176854', '177134', '177136', '177141', '177170', '177590', '177766', '178498', '178499', '178500', '178509', '178546', '178578', '178579', '178627', '178632', '178721', '178736', '178745', '178770', '178776', '178783', '178788', '178946', '179002', '179011', '179032', '179243', '179244', '179314', '179315', '179354', '179356', '179386', '179393', '179395', '179400', '179544', '179546', '179553', '179642', '179651', '179667', '179790', '179792']
    nosy_count = 12.0
    nosy_names = ['georg.brandl', 'jcea', 'larry', 'ezio.melotti', 'Arfrever', 'eli.bendersky', 'skrah', 'flox', 'santoso.wijaya', 'python-dev', 'einarfd', 'danielsh']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue16076'
    versions = ['Python 3.3', 'Python 3.4']

    @einarfd
    Copy link
    Mannequin Author

    einarfd mannequin commented Sep 28, 2012

    The xml.etree.ElementTree.Element class is no longer pickleable in Python 3.3.0rc3 .

    This is a regression from Python 3.2 where the Element class was pickleable, while the xml.etree.cElementTree.Element was not. So this is probably related to switching the ElementTree implementation in Python 3.3.

    I've looked at the "what's new" documentation in Python 3.3, and there was nothing that indicated that the ElemenTree switchover would give this kind of issues.

    I've run into this issues because I use the multiprocessing module to pass parsed XML documents between processes, and when I can't pickle Element objects. This stops working.

    You can reproduce the issue with the following code:
    import pickle
    from xml.etree.ElementTree import Element

    print(pickle.dumps(Element("foo")))

    @einarfd einarfd mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 28, 2012
    @einarfd einarfd mannequin removed the stdlib Python modules in the Lib dir label Sep 28, 2012
    @serhiy-storchaka
    Copy link
    Member

    In 3.2 repr(xml.etree.ElementTree.Element) is "<class 'xml.etree.ElementTree.Element'>".
    In 3.3 repr(xml.etree.ElementTree.Element) is "<class 'Element'>".

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Oct 7, 2012

    Will look into it, thanks for the report.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 11, 2012

    This is because, in Python 3.3, xml.etree.ElementTree.Element is overridden by the C accelerator version, _elementtree.Element.

    If you remove the statement,

    1708 from _elementtree import *

    the behavior from Python 3.2 comes back.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 11, 2012

    OTOH, xml.etree.cElementTree.Element in Python 3.2 and earlier has never been pickleable, either.

    Python 3.2.3+ (3.2:24499eebbc2f, Oct 10 2012, 13:54:45) 
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from xml.etree.cElementTree import Element
    >>> repr(Element)
    '<built-in function Element>'
    >>> import pickle
    >>> pickle.dumps(Element('foo'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed

    @santosowijaya santosowijaya mannequin added the stdlib Python modules in the Lib dir label Oct 11, 2012
    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 3, 2012

    Attached patch for your consideration.

    I've tested pickling/unpickling and comparing the resulting object attribute by attribute (.tag, .attrib, .text, .tail for equality; and recursively .getchildren()), and 'make test' --- all seems to work.

    If the approach is sound, I can submit a revised patch that also updates documentation, regression tests, and potententially updates other type objects in the same manner as this patch updates _elementtree.Element .

    @danielsh danielsh mannequin added the extension-modules C modules in the Modules dir label Dec 3, 2012
    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 7, 2012

    TreeBuilder was also pickleable in 3.2 but now isn't so; updating summary accordingly.

    Attaching a checkpoint patch. It addresses both Element and TreeBuilder, and adds tests. The added tests fail if test___all__ is included in the test run, so the patch should not be applied as-is.

    Thanks to Ezio Melotti for preliminary review and suggestions on IRC.

    @danielsh danielsh mannequin changed the title xml.etree.ElementTree.Element is no longer pickleable xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable Dec 7, 2012
    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 7, 2012

    Reattaching without the unrelated Python-ast.c change.

    @Arfrever Arfrever mannequin added the release-blocker label Dec 7, 2012
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 8, 2012

    Thanks for working on it, Daniel. Unfortunately I won't have time to look at it in the near future, but I will definitely look at the patch once I get some free time to hack on Python again.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 8, 2012

    @eli, thanks. In the meantime, attaching a patch addressing Ezio's review; the caveats from msg 177134 still apply.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 16, 2012

    Haven't had a chance to look at the test___all__ -related failures yet. Still planning to do so once I have some time.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 19, 2012

    Re the problem from msg 177134 ('./python -mtest test___all__ test_xml_etree' failing), it is a a preexisting problem, in the sense that applying just the Lib/test/ part of this patch suffices to trigger it. Adding 'xml.etree' to the blacklist in test___all__.py:50 suffices to address the problem. At a guess that is because when test___all__ imports the 'xml.etree' module, it doesn't filter out the '_elementtree' symbol like test_xml_etree.py:test_main() does.

    New patch attached; differences to previous version:

    • Removed debug prints.
    • Skip test_pickling when testing the Python xml.etree.
      (The skips could be removed in the future, if/when the test___all__
      interaction problem is resolved.)

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 29, 2012

    Daniel, is your patch made vs. the 3.3 branch? I'll need to apply there first, and then merge up to default (3.4).

    [Also, removing the 3.2 tag here. 3.2 won't be fixed to make _elementtree pickleable - it never was].

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 29, 2012

    I wrote the patch against default (3.4), but it applies cleanly to 3.3.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 29, 2012

    Also, could you explain what makes test___all__ start failing with this patch? What are you adding that makes that happen?

    P.S. I suspect the root reason is the bad way etree tests are structured in general. See bpo-15083

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 29, 2012

    Any attempt to pickle an Element object in test_xml_etree.py causes that test to fail if test___all__ had been run before it; see attached transcript. It's against 3.4, with no changes other than the testsuite changes shown within.

    I don't know what the root cause is. As noted before, adding 'xml.etree' to test___all__.AllTest.test_all.blacklist is a workaround, so I assume the root cause lies within the test framework --- a bad interaction between the import magics in test___all__ and test_xml_etree.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 30, 2012

    I think I understand what's going on there. Pickle, to make sure an object can be picked, looks at the module it comes from and tries to import its class, to make sure they're the same.

    What happens since 3.3 is that for the Python Element class, it imports ElementTree. The harness in test_xml_etree carefully sets up to ignore _elementtree so the correct class is pulled. However, if test___all__ runs before it it just does a brute "from ElementTree import *" which places the C version in sys.modules, and this is what pickle finds.

    So in the meantime, until bpo-15083 is resolved I think it's safe to put ET in the ignore list of test___all__.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 30, 2012

    New changeset 71508fc738bb by Eli Bendersky in branch '3.3':
    For Issue bpo-16076: make sure that pickling of Element objects is tested, and do
    http://hg.python.org/cpython/rev/71508fc738bb

    New changeset 5a38f4d7833c by Eli Bendersky in branch 'default':
    For issue bpo-16076: merge 3.3
    http://hg.python.org/cpython/rev/5a38f4d7833c

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Dec 30, 2012

    I've added some (currently pyET specific) pickling tests. Daniel, could you re-generate the patch? Note that for the C version of pickling you can now enable the pickle test. Point to consider - can elements pickled in C be unpickled in Python and vice versa? It should probably work (see test_decimal's pickling tests).

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 30, 2012

    Attached. Differences to previous version:

    • Avoid the test___all__ issue (bpo-16817) by manipulating sys.modules
      directly

    • Support pickling interoperability between the C and Python
      implementations

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Dec 31, 2012

    BTW, maybe I'm just confused, but I'm a little surprised that C pickling -> Python unpickling works: the C __reduce__ returns a 3-element tuple, which according to pickle docs[1] must either be a dictionary or be passed to __setstate__; it's not a dictionary and the Python implementation doesn't have a __setstate__.

    http://docs.python.org/3/library/pickle#pickle.object.\_\_reduce__

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 1, 2013

    A couple of high-level questions about the C code (this isn't a detailed review yet):

    1. Why did you choose to implement __reduce__ and not __getstate__?
    2. A lot of code appears to be shared between element_setstate_from_attributes and the init method implementation, can it be refactored?
    3. I've been thinking about the C<->Python pickle compatibility, and it may be too much trouble at this point for no very strong benefit. What could make more sense at some point is implementing a __getstate__/setstate in both the Python and C versions that are compatible (i.e. produce the same things). But this may mean changing the way the Python Element is pickled in 3.3, which may not be acceptable. So we may prefer to forego the compatibility in 3.3 and then for 3.4 make the change. I'm still not sure what is best here.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 1, 2013

    Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:

    Why did you change the class name, by the way, I don't think it's
    a valid change at least for 3.3 in terms of backwards compatibility.

    With unmodified tip of 3.4:

        >>> import pickle, xml.etree.ElementTree as ET
        >>> pickle.dumps(ET.Element('foo'))
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        _pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed

    I added the "_elementtree" to the tp_name in order to bypass the above
    error. Module-qualified names were in use elsewhere (including by
    _elementtree._element_iterator) so it seemed reasonable. I'll defer to
    you about compatibility implications of this change.

    Regarding that compatibility, and even easier idea would be for the
    C pickle to return the same __dict__ implicitly gathered from the
    Python version, and then only one version of the unpickle is required.

    That makes sense. But going forward it might be even better to define
    an explicit __reduce__/getstate for the Python version too, so if
    the instance dict grows new members, they won't get serialised
    unintentionally. (This consideration is also why C code in the latest
    patch makes some effort to notice unknown args in the dict.)

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 1, 2013

    Eli Bendersky wrote on Tue, Jan 01, 2013 at 00:32:51 +0000:

    1. Why did you choose to implement __reduce__ and not __getstate__?

    Maybe I was simply imitating what some other extension module was doing ;)

    By using __reduce__ with the type as first return value, the
    __setstate__ code becomes simpler because it doesn't need to address
    'tag' and 'attrib'. This consideration is somewhat moot now that the
    unpickle-Python-pickled-Element's code path has to think about all
    instance attributes, including those settable by the constructor.

    1. A lot of code appears to be shared between
      element_setstate_from_attributes and the init method implementation,
      can it be refactored?

    It seems there might be room for code reuse --- both functions set
    tag,text,tail,attrib (but the unpickler sets the children too). I'll
    look into that in the next iteration (once the class name and pickle
    output format issues are settled).

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 1, 2013

    Also, the class inheritance in the tests should be amended to follow bpo-16835.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 3, 2013

    On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:

    >
    > Daniel Shahaf added the comment:
    >
    > Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:
    > > Why did you change the class name, by the way, I don't think it's
    > > a valid change at least for 3.3 in terms of backwards compatibility.
    > >
    >
    > With unmodified tip of 3.4:
    >
    >     >>> import pickle, xml.etree.ElementTree as ET
    >     >>> pickle.dumps(ET.Element('foo'))
    >     Traceback (most recent call last):
    >       File "<stdin>", line 1, in <module>
    >     _pickle.PicklingError: Can't pickle <class 'Element'>: attribute
    > lookup builtins.Element failed
    >
    > I added the "_elementtree" to the tp_name in order to bypass the above
    > error.  Module-qualified names were in use elsewhere (including by
    > _elementtree._element_iterator) so it seemed reasonable.  I'll defer to
    > you about compatibility implications of this change.

    I asked on pydev, but this is a key point to resolve. Can
    pickling/unpickling be made to work correctly without such (or similar)
    change at all? How will the unpickler know which module to load when it
    sees a pickled object of Element type?

    If this change is required (even if we choose to name it
    "xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
    regression, we may find ourselves in a need to change it between 3.3 and
    3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
    resolved conclusively.

    Danial, could you investigate if such a change is absolutely required to
    make pickling/unickling of Element work?

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 4, 2013

    Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:

    On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote:
    > I added the "_elementtree" to the tp_name in order to bypass the above
    > error. Module-qualified names were in use elsewhere (including by
    > _elementtree._element_iterator) so it seemed reasonable. I'll defer to
    > you about compatibility implications of this change.

    I asked on pydev, but this is a key point to resolve. Can
    pickling/unpickling be made to work correctly without such (or similar)
    change at all? How will the unpickler know which module to load when it
    sees a pickled object of Element type?

    Is there a requirement that it loads a particular module? Would etree
    users notice the difference if pickle.load() returns an instance of the
    "other" Element implementation than the one they pickled?

    Danial, could you investigate if such a change is absolutely required to
    make pickling/unickling of Element work?

    Yes, I'll look into that.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 4, 2013

    Daniel Shahaf <report@bugs.python.org> wrote:

    Is there a requirement that it loads a particular module? Would etree
    users notice the difference if pickle.load() returns an instance of the
    "other" Element implementation than the one they pickled?

    Yes: If you send an "_elementtree.Element" pickle to another machine that
    doesn't have a working C version, it can't be unpickled. As far as I know
    the only way around that is to pickle as "xml.etree.ElementTree.Element".

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 4, 2013

    On Jan 4, 2013 2:09 AM, "Stefan Krah" <report@bugs.python.org> wrote:

    Stefan Krah added the comment:

    Daniel Shahaf <report@bugs.python.org> wrote:
    > Is there a requirement that it loads a particular module? Would etree
    > users notice the difference if pickle.load() returns an instance of the
    > "other" Element implementation than the one they pickled?

    Yes: If you send an "_elementtree.Element" pickle to another machine that
    doesn't have a working C version, it can't be unpickled. As far as I know
    the only way around that is to pickle as "xml.etree.ElementTree.Element".

    ----------

    Yes this is a good point. Another thing to consider is that if both report
    the same name then it will be possible to unpickle on a machine running 3.2


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue16076\>


    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 7, 2013

    Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000:

    If this change is required (even if we choose to name it
    "xml.etree.ElementTree.Element" for Py compatibility to fix the pickling
    regression, we may find ourselves in a need to change it between 3.3 and
    3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be
    resolved conclusively.

    Danial, could you investigate if such a change is absolutely required to
    make pickling/unickling of Element work?

    There are a few options:

    • Change c-Element's tp_name to "xml.etree.ElementTree.Element".

    • Register a reduce function for c-Element's that serialises them by
      constructing an equivalent py-Element and returning the latter's
      .__dict__ via the third return value:
      http://docs.python.org/3/library/copyreg#copyreg.pickle

    • Add an entry mapping "builtins.Element" to
      "xml.etree.ElementTree.Element" to _compat_pickle.IMPORT_MAPPING
      (which is used by Lib/pickle.py:_Pickler.find_class()).

    I haven't tried to implement either of these approaches.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 7, 2013

    I think that changing the _elementtree's Element's name to match the Python version, and then making sure the same serialized object is returned - is a worthy option to try.

    Then it will hopefully "just work". When pickle deserializes a user-defined object that says it's a xml.etree.ElementTree.Element, it will try to import Element from xml.etree.ElementTree and should get the accelerated class. On machines without acceleration it will get the Python class.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 8, 2013

    In the meantime, unrelated question: should TreeBuilder be pickleable too, and interchangeably (pickle|unpickle)able between the Python and C implementations? Asking because the Python TreeBuilder has a _factory member which it initializes to 'Element', and I'm not quite sure how cross-pickling should handle that.

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 8, 2013

    New iteration.

    Open issues:

    • Share code with the init method. The issue with sharing code with either element_init() or create_new_element() would be malloc+realloc: unlike either of these methods, we know both the attributes and the number of children at allocation time, so we can allocate directly the right number of children.

    • C<->Py Interchangeable pickling of TreeBuilder (per above msg).

    Differences to previous version:

    • Skip C<->Py interchangeability testing of TreeBuilder --- because that one started failing with:
      _pickle.UnpicklingError: state is not a dictionary
      It can probably be fixed, but I'd like to address the above question about pickling the factory first.

    • Use __getstate__ rather than __reduce__ for Element.

    • Make Element pickling interchangeable between c/py Element
      (set tp_name to "xml.etree.ElementTree.Element" and match the
      pickled format).

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 8, 2013

    Daniel, thanks for this patch, it looks very good. I had some comments in the code review.

    As for TreeBuilder, let's split it into a separate issue - I think it's much less critical. We can discuss it later once we're done with Element. If anything, it's more important to handle ElementTree itself first - it should be very easy (much simpler than Element).

    With the factory method I'm not sure how the Py pickling happens. I need to dig a bit in pickling for that, because I don't know how method reference pickling works.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 8, 2013

    P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
    tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
    print(pickle.dumps(tb))

    Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 8, 2013

    Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:

    Eli Bendersky added the comment:

    P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
    tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
    print(pickle.dumps(tb))

    Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed

    Is that with or without the patch?

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 8, 2013

    On Tue, Jan 8, 2013 at 2:51 PM, Daniel Shahaf <report@bugs.python.org>wrote:

    Daniel Shahaf added the comment:

    Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:
    >
    > Eli Bendersky added the comment:
    >
    > P = import_fresh_module('xml.etree.ElementTree',
    blocked=['_elementtree'])
    > tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
    > print(pickle.dumps(tb))
    >
    > Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute
    lookup builtins.function failed

    Is that with or without the patch?

    Without. Pickle can't handle functions that are not top-level, so bound
    methods, internal functions, lambdas are all out. So pickling probably
    wasn't a first priority for the Python version of TreeBuilder as well...

    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 8, 2013

    Dissociating TreeBuilder from this issue per recent comments.

    @danielsh danielsh mannequin changed the title xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable xml.etree.ElementTree.Element is no longer pickleable Jan 8, 2013
    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 8, 2013

    v8 removes TreeBuilder handling (code + tests) and removes memcpy per review. (Outstanding issues in the review can be fixed post-v8, if needed.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 10, 2013

    New changeset 8d6dadfecf22 by Eli Bendersky in branch '3.3':
    Issue bpo-16076: make _elementtree.Element pickle-able in a way that is compatible
    http://hg.python.org/cpython/rev/8d6dadfecf22

    New changeset 4c268b7c86e6 by Eli Bendersky in branch 'default':
    Issue bpo-16076: make _elementtree.Element pickle-able in a way that is compatible
    http://hg.python.org/cpython/rev/4c268b7c86e6

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 10, 2013

    Fixed in 3.3 and default. Thanks for the good work, Daniel. A separate issue can be opened for TreeBuilder.

    @elibendersky elibendersky mannequin closed this as completed Jan 10, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 10, 2013

    New changeset c46054b49b6c by Eli Bendersky in branch '3.3':
    Update Misc/NEWS for issue bpo-16076
    http://hg.python.org/cpython/rev/c46054b49b6c

    @ezio-melotti
    Copy link
    Member

    The fix introduced some refleaks:
    $ ./python -m test -R3:2 test_xml_etree_c
    test_xml_etree_c leaked [56, 56] references, sum=112

    One seems to be in __getstate__:
    diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c
    --- a/Modules/_elementtree.c
    +++ b/Modules/_elementtree.c
    @@ -859,8 +859,10 @@
                                          PICKLED_ATTRIB, self->extra->attrib,
                                          PICKLED_TEXT, self->text,
                                          PICKLED_TAIL, self->tail);
    -    if (instancedict)
    +    if (instancedict) {
    +        Py_DECREF(children);
             return instancedict;
    +    }
         else {
             for (i = 0; i < PyList_GET_SIZE(children); i++)
                 Py_DECREF(PyList_GET_ITEM(children, i));

    I'm still looking for the other.

    @ezio-melotti
    Copy link
    Member

    This is what I found out.
    I used an easily copy/pastable one-liner that creates 3 variables: e (no children), e2 (3 children), e3 (5 children).

    Original leaky code (test_xml_etree_c leaked [56, 56] references, sum=112):
    >>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
    [76773 refs]
    >>> ### e has no children and leaks 1 ref:
    >>> e.__getstate__()
    {'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
    [76791 refs]
    >>> e.__getstate__()
    {'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
    [76792 refs]
    >>> e.__getstate__()
    {'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
    [76793 refs]
    >>> ### e2 has 3 children and leaks 4 refs:
    >>> e2.__getstate__()
    {'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
    [76798 refs]
    >>> e2.__getstate__()
    {'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
    [76802 refs]

    The leaked refs seems to be 1 for the children list + 1 for each children.
    The diff I pasted in the previous *seems* to fix this (i.e. leaks in __gestate__ are gone, tests pass), but I had it crash once (couldn't reproduce after that, so it might be unrelated*), and I'm not sure it's correct.

    With that patch applied we go down to test_xml_etree_c leaked [6, 6] references, sum=12.
    The remaining leaks seem to be in __setstate__.
    Patched code:
    >>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
    [76773 refs]
    >>> ### no more leaks for getstate:
    >>> p = e.__getstate__()
    [76787 refs]
    >>> p = e.__getstate__()
    [76787 refs]
    >>> ### also no leaks when there are no child:
    >>> e.__setstate__(p)
    [76788 refs]
    >>> e.__setstate__(p)
    [76788 refs]
    >>> ### no more leaks for getstate with children:
    >>> p2 = e2.__getstate__()
    [76807 refs]
    >>> p2 = e2.__getstate__()
    [76807 refs]
    >>> ### one ref leaked for every child in __setstate__:
    >>> e2.__setstate__(p2)
    [76810 refs]
    >>> e2.__setstate__(p2)
    [76813 refs]
    >>> e2.__setstate__(p2)
    [76816 refs]

    I'm not working on this anymore now, so someone more familiar with the code can take a look, see if my patch is correct, and fix the remaining leaks.

    • maybe I'm doing something wrong, but ISTM that make -j2 doesn't always work right away, and sometimes I get different results if I do it again without touching the code.

    @ezio-melotti ezio-melotti reopened this Jan 11, 2013
    @danielsh
    Copy link
    Mannequin

    danielsh mannequin commented Jan 11, 2013

    Ezio Melotti wrote on Fri, Jan 11, 2013 at 08:44:43 +0000:
    > >>> ### one ref leaked for every child in __setstate__:
    > >>> e2.__setstate__(p2)
    > [76810 refs]
    > >>> e2.__setstate__(p2)
    > [76813 refs]
    > >>> e2.__setstate__(p2)
    > [76816 refs]
    > 
    > I'm not working on this anymore now, so someone more familiar with the
    > code can take a look, see if my patch is correct, and fix the
    > remaining leaks.
    > 

    Thank for narrowing it down so much. Attached patch incorporates your
    getstate fix and a fix for setstate. It passes the -R 3:2 tests.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 12, 2013

    New changeset 5b36768b9a11 by Eli Bendersky in branch '3.3':
    Issue bpo-16076: fix refleak in pickling of Element.
    http://hg.python.org/cpython/rev/5b36768b9a11

    New changeset 848738d3c40f by Eli Bendersky in branch 'default':
    Close bpo-16076: fix refleak in pickling of Element.
    http://hg.python.org/cpython/rev/848738d3c40f

    @python-dev python-dev mannequin closed this as completed Jan 12, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 12, 2013

    New changeset 4501813ea676 by Eli Bendersky in branch '3.3':
    Issue bpo-16076: check for return value of PyTuple_New for args (following
    http://hg.python.org/cpython/rev/4501813ea676

    New changeset 7313096e0bad by Eli Bendersky in branch 'default':
    Issue bpo-16076: check for return value of PyTuple_New for args (following
    http://hg.python.org/cpython/rev/7313096e0bad

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir release-blocker stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants