classification
Title: ElementTree objects should support all the same methods as Element objects
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, martin.panter, rhettinger, scoder
Priority: normal Keywords:

Created on 2014-03-22 22:47 by rhettinger, last changed 2014-04-17 01:26 by rhettinger. This issue is now closed.

Messages (10)
msg214521 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-22 22:47
The inner objects are Elements which has a great deal of flexiblity (for example, they can be iterated over directly).   The outermost object is an ElementTree which lacks those capabilities (it only supports findall).

For example in a catalog of books:

    catalog = xml.etree.ElementTree.parse('books.xml')

    # This succeeds
    for book in catalog.findall('book'):
        print(book.tag)

    # This fails:
    for book in catalog:
        print(book.tag)

    # But for inner elements, we have more options
    book = catalog.find('bk101')
    for subelement in book:
        print(subelement.tag)

Here are the differences between the API for ElementTree and Element

In [9]: set(dir(book)) - set(dir(catalog))
Out[9]: 
{'__delitem__',
 '__getitem__',
 '__len__',
 '__nonzero__',
 '__setitem__',
 '_children',
 'append',
 'attrib',
 'clear',
 'copy',
 'extend',
 'get',
 'getchildren',
 'insert',
 'items',
 'itertext',
 'keys',
 'makeelement',
 'remove',
 'set',
 'tag',
 'tail',
 'text'}

In [10]: set(dir(catalog)) - set(dir(book))
Out[10]: {'_root', '_setroot', 'getroot', 'parse', 'write', 'write_c14n'}

Note, the XML data model requires that the outermost element have some capabilities that inner elements don't have (such as comments and processing instructions).  That said, the outer element shouldn't have fewer capabilities that the inner elements.
msg214550 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-03-23 07:34
catalog = xml.etree.ElementTree.parse('books.xml')

    # This succeeds
    for book in catalog.findall('book'):
        print(book.tag)

This is a bit of a convenience break in the API. The "normal" way to do it would be either catalog.getroot().findall('book') or catalog.findall('/catalog/book'). There is not much use in requiring to include the root element in the path expression, therefore it's allowed to leave it out. Note that you can't use absolute path expressions on Elements, this is a difference to the ElementTree object.


    # This fails:
    for book in catalog:
        print(book.tag)

Iterating over an ElementTree? What would that even mean? Why would you expect it to iterate over the children of the root Element, and not, say, all Elements in the document? I think that ambiguity is a good reason to not make ElementTree objects iterable.


    # But for inner elements, we have more options
    book = catalog.find('bk101')
    for subelement in book:
        print(subelement.tag)


> Note, the XML data model requires that the outermost element have some capabilities that inner elements don't have (such as comments and processing instructions).  That said, the outer element shouldn't have fewer capabilities that the inner elements.

ISTM that you are misinterpreting the ElementTree object as representing the document root whereas it actually represents the document. The root Element is given by tree.getroot().
msg215238 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-03-31 13:39
Raymond, you are right that the APIs presented by Element and ElementTree are somewhat different. As Stefan mentioned, they were really meant to represent different things, but with time some "convenience" features crept in and made the difference somewhat more moot.

Note that some methods/functions in ET give you the root element directly, rather than the tree. For example the XML function, or fromstring function.

Also, the tree implements the iter() method, which is morally equivalent to Element.iter() on the root node. However, the tree (unlike Element) is not iterable. Element implements __getitem__, the tree does not.

Currently, the first code snippet in the official documentation shows:

  import xml.etree.ElementTree as ET
  tree = ET.parse('country_data.xml')
  root = tree.getroot()

Which makes the distinction between the tree and its root.

Whether this is a great API (making tree and root distinct), I can't say, but we can't change it now. Do you have concrete suggestions? Make the tree iterable? Add all element methods to the tree, implicitly forwarding to the root? Improve documentation?
msg215259 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-31 19:49
> Do you have concrete suggestions? Make the tree iterable? 
> Add all element methods to the tree, implicitly forwarding to the root? 

Yes, that is the feature request.  Add all the element methods to the elementtree object.

Implicitly forwarding to the root would be a reasonable way to do it, but that is just an implementation detail.
msg215358 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-04-02 06:01
> Add all the element methods to the elementtree object.

Ok, but why? An ElementTree object *is not* an Element. It's a
representation of a document that *has* a root Element.

It makes sense for a document to allow searches over its content, and the
ElementTree class currently supports that, using the find*() or iter()
methods. They are "deep" or "global" content accessor shortcuts, in
addition to the path through the normal getroot() method.

But I can't see how making ElementTree objects look and behave like their
own root Element improves anything. Instead, it would just make the
distinction between the two completely unclear, and would also lead to
quirks like the question why iterating over a document yields the second
level of children. Or the question what the "attrib" property of a document
could mean.

Instead of blurring it, would you have an idea what we could improve in the
documentation to make this distinction clearer?
msg215360 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-04-02 07:22
> ElementTree class currently supports that, 
> using the find*() or iter() methods. 

That would be great except that ElementTree doesn't actually have an __iter__ method.

> Ok, but why?

The short answer is that every time I conduct Python training, people routinely trip over this issue.  The whole point of the ElementTree package was to have a more pythonic interface than DOM or SAX.  I'm sure there are people that argue that the requests module isn't great because it conflates requesting with authentication and password management, but the beauty of requests is that its API matches how people try to use it.  The outer ElementTree object is awkward in this regard.

I don't see any benefit from having this code fail:


    from xml.etree.ElementTree import parse

    catalog = parse('books.xml')
    for book in catalog:
        print book.get('id')
msg215361 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-04-02 07:45
> I don't see any benefit from having this code fail:
> 
>     from xml.etree.ElementTree import parse
> 
>     catalog = parse('books.xml')
>     for book in catalog:
>         print book.get('id')

Why would you expect it to work? And how?

Why would it only iterate over the *children* of the root Element that it
wraps, and not yield the root Element itself, and maybe any preceding or
following processing instructions or comments, the doctype declaration, etc.?
msg215382 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2014-04-02 13:32
> Do you have concrete suggestions? Make the tree iterable?

>  > Add all element methods to the tree, implicitly forwarding to the root?
>
> Yes, that is the feature request.  Add all the element methods to the
> elementtree object.
>
> Implicitly forwarding to the root would be a reasonable way to do it, but
> that is just an implementation detail.

Porting over all methods of Element to ElementTree sounds like an overkill
to me. How about just making a sensibly-behaving __iter__ for ElementTree?
This should be easy because ElementTree already has a iter() method that
behaves as needed (goes over all elements including root). Would iteration
+ perhaps clearer documentation solve most of the problem?
msg215383 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-04-02 14:02
> How about just making a sensibly-behaving __iter__ for ElementTree?

Well, the problem is to determine what "sensibly-behaving" is. I can see
three options.

1) tree.iter()  ==  tree.getroot().iter()

2) iter(tree.getroot())

3) iter([tree.getroot()])

The second option feels plain wrong to me.

The last one would allow the extension towards PI/comment siblings, as I
described before. There isn't currently a way to get at them (which doesn't
hurt, because ET doesn't currently even pass them through from its parser,
as discussed in issue 9521). Once there is a way in ET to parse them in (as
in lxml), making ElementTree objects iterable would nicely solve the issue
of how to process them afterwards.

It's not the only solution for that problem, though, adding a
".gettoplevel()" method would similarly work. Thus, either 1) or 3) would
fit the API, with the downside of 1) being that it's just completely
redundant functionality and I don't consider saving 7 simple characters
worth the increase in API overhead.

That leaves 3) as an option. It's nice because the iteration then works on
the same axis as for Elements, so x.iter() and iter(x) would behave in the
same way for both classes.
msg216634 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-04-17 01:26
Given Stephan's concerned, I withdraw this feature request.
History
Date User Action Args
2014-04-17 01:26:21rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg216634
2014-04-02 14:02:06scodersetmessages: + msg215383
2014-04-02 13:32:10eli.benderskysetmessages: + msg215382
2014-04-02 07:45:49scodersetmessages: + msg215361
2014-04-02 07:22:19rhettingersetmessages: + msg215360
2014-04-02 06:01:59scodersetmessages: + msg215358
2014-03-31 19:49:46rhettingersetmessages: + msg215259
2014-03-31 13:39:28eli.benderskysetmessages: + msg215238
2014-03-23 07:34:21scodersetnosy: + scoder, eli.bendersky
messages: + msg214550
2014-03-23 03:53:28martin.pantersetnosy: + martin.panter

title: ElementTree objects should support all the same methods are Element objects -> ElementTree objects should support all the same methods as Element objects
2014-03-22 22:47:56rhettingercreate