classification
Title: Document that _elementtree C API cannot use custom TreeBuilder for iterparse or IncrementalParser
Type: behavior Stage: resolved
Components: Documentation, XML Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Aaron.Oakley, docs@python, eli.bendersky, python-dev
Priority: normal Keywords: patch

Created on 2013-05-03 20:56 by Aaron.Oakley, last changed 2013-08-27 03:41 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
elementtree.rst-340a0.patch Aaron.Oakley, 2013-05-03 20:56 xml.etree.ElementTree Documentation patch review
Messages (7)
msg188329 - (view) Author: Aaron Oakley (Aaron.Oakley) * Date: 2013-05-03 20:56
It would really help to document that the C API can only use the default xml.etree.ElementTree.TreeBuilder for targets with iterparse (and by extension, IncrementalParser).

I got a nice surprise about that when I went from 3.2 to 3.3 and started getting "TypeError: event handling only supported for ElementTree.TreeBuilder targets".

I included a patch to add notes to iterparse and IncrementalParser, but I'm not sure what to refer to the C module as since xml.etree.cElementTree is deprecated.
msg189562 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-05-18 23:18
Aaron, could you please sign the PSF CLA (http://www.python.org/psf/contrib/contrib-form/) - this will make it accepting patches from you easier.

Other than that, I agree it's a legit patch. The alternative would be to fix _elementtree to actually allow arbitrary TreeBuilders there, although I'm not sure it's worth the effort.
msg191362 - (view) Author: Aaron Oakley (Aaron.Oakley) * Date: 2013-06-17 19:04
So sorry, I just found the emails from the bug tracker in my spam folder. Anyhow, I've now signed the CLA.
msg194323 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-04 01:55
New changeset a5a5ba4f71ad by Eli Bendersky in branch '3.3':
Issue #17902: Clarify doc of ElementTree.iterparse
http://hg.python.org/cpython/rev/a5a5ba4f71ad

New changeset 96f45011957e by Eli Bendersky in branch 'default':
Issue #17902: Clarify doc of ElementTree.iterparse and IncrementalParser
http://hg.python.org/cpython/rev/96f45011957e
msg196256 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-27 01:21
Aaron - could you describe your use case of passing a custom parser into iterparse? We're currently considering deprecating the feature of passing a parser into iterparse in a future release (this is being discussed in issue 17741).
msg196259 - (view) Author: Aaron Oakley (Aaron.Oakley) * Date: 2013-08-27 02:11
From memory, the use case at the time was using a custom TreeBuilder sub-class fed into a builtin XMLParser object. The code would construct a builder separately and keep a reference to it around. The builder would delegate calls to start(), data(), end(), and close() to super and save the completed tree when its close() was called.

    my_builder = CustomTreeBuilder()
    et_parser = ET.XMLParser(target=my_builder)

    for (evt, elem) in ET.iterparse("...", events, parser=et_parser):
        pass  # Do first processing

    tree = my_builder.root  # Saved tree

It was done like this initially so that some data (I can't recall exactly what) from the XML input could be processed first very conveniently using the parse events from iterparse while allowing the whole tree to be retrieved afterwards.

That said, the project later moved to using lxml for various features not contained in xml.etree.ElementTree, and I don't think the process I described is still being used.
msg196261 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-27 03:41
On Mon, Aug 26, 2013 at 7:11 PM, Aaron Oakley <report@bugs.python.org>wrote:

>
> Aaron Oakley added the comment:
>
> >From memory, the use case at the time was using a custom TreeBuilder
> sub-class fed into a builtin XMLParser object. The code would construct a
> builder separately and keep a reference to it around. The builder would
> delegate calls to start(), data(), end(), and close() to super and save the
> completed tree when its close() was called.
>
>     my_builder = CustomTreeBuilder()
>     et_parser = ET.XMLParser(target=my_builder)
>
>     for (evt, elem) in ET.iterparse("...", events, parser=et_parser):
>         pass  # Do first processing
>
>     tree = my_builder.root  # Saved tree
>
> It was done like this initially so that some data (I can't recall exactly
> what) from the XML input could be processed first very conveniently using
> the parse events from iterparse while allowing the whole tree to be
> retrieved afterwards.
>
> That said, the project later moved to using lxml for various features not
> contained in xml.etree.ElementTree, and I don't think the process I
> described is still being used.
>

Thanks for the information, Aaron; much appreciated.
History
Date User Action Args
2013-08-27 03:41:41eli.benderskysetmessages: + msg196261
2013-08-27 02:11:20Aaron.Oakleysetmessages: + msg196259
2013-08-27 01:21:36eli.benderskysetmessages: + msg196256
2013-08-04 01:55:45eli.benderskysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-08-04 01:55:30python-devsetnosy: + python-dev
messages: + msg194323
2013-06-17 19:04:32Aaron.Oakleysetmessages: + msg191362
2013-05-18 23:18:18eli.benderskysetmessages: + msg189562
2013-05-03 23:24:55pitrousetnosy: + eli.bendersky
stage: patch review

versions: + Python 3.3
2013-05-03 20:56:18Aaron.Oakleycreate