Issue 21028: ElementTree objects should support all the same methods as Element objects

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/65227

classification

Title:	ElementTree objects should support all the same methods as Element objects
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	eli.bendersky, martin.panter, rhettinger, scoder
Priority:	normal	Keywords:

Created on 2014-03-22 22:47 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (10)
msg214521 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-03-22 22:47
The inner objects are Elements which has a great deal of flexiblity (for example, they can be iterated over directly). The outermost object is an ElementTree which lacks those capabilities (it only supports findall). For example in a catalog of books: catalog = xml.etree.ElementTree.parse('books.xml') # This succeeds for book in catalog.findall('book'): print(book.tag) # This fails: for book in catalog: print(book.tag) # But for inner elements, we have more options book = catalog.find('bk101') for subelement in book: print(subelement.tag) Here are the differences between the API for ElementTree and Element In [9]: set(dir(book)) - set(dir(catalog)) Out[9]: {'__delitem__', '__getitem__', '__len__', '__nonzero__', '__setitem__', '_children', 'append', 'attrib', 'clear', 'copy', 'extend', 'get', 'getchildren', 'insert', 'items', 'itertext', 'keys', 'makeelement', 'remove', 'set', 'tag', 'tail', 'text'} In [10]: set(dir(catalog)) - set(dir(book)) Out[10]: {'_root', '_setroot', 'getroot', 'parse', 'write', 'write_c14n'} Note, the XML data model requires that the outermost element have some capabilities that inner elements don't have (such as comments and processing instructions). That said, the outer element shouldn't have fewer capabilities that the inner elements.
msg214550 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-03-23 07:34
catalog = xml.etree.ElementTree.parse('books.xml') # This succeeds for book in catalog.findall('book'): print(book.tag) This is a bit of a convenience break in the API. The "normal" way to do it would be either catalog.getroot().findall('book') or catalog.findall('/catalog/book'). There is not much use in requiring to include the root element in the path expression, therefore it's allowed to leave it out. Note that you can't use absolute path expressions on Elements, this is a difference to the ElementTree object. # This fails: for book in catalog: print(book.tag) Iterating over an ElementTree? What would that even mean? Why would you expect it to iterate over the children of the root Element, and not, say, all Elements in the document? I think that ambiguity is a good reason to not make ElementTree objects iterable. # But for inner elements, we have more options book = catalog.find('bk101') for subelement in book: print(subelement.tag) > Note, the XML data model requires that the outermost element have some capabilities that inner elements don't have (such as comments and processing instructions). That said, the outer element shouldn't have fewer capabilities that the inner elements. ISTM that you are misinterpreting the ElementTree object as representing the document root whereas it actually represents the document. The root Element is given by tree.getroot().
msg215238 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2014-03-31 13:39
Raymond, you are right that the APIs presented by Element and ElementTree are somewhat different. As Stefan mentioned, they were really meant to represent different things, but with time some "convenience" features crept in and made the difference somewhat more moot. Note that some methods/functions in ET give you the root element directly, rather than the tree. For example the XML function, or fromstring function. Also, the tree implements the iter() method, which is morally equivalent to Element.iter() on the root node. However, the tree (unlike Element) is not iterable. Element implements __getitem__, the tree does not. Currently, the first code snippet in the official documentation shows: import xml.etree.ElementTree as ET tree = ET.parse('country_data.xml') root = tree.getroot() Which makes the distinction between the tree and its root. Whether this is a great API (making tree and root distinct), I can't say, but we can't change it now. Do you have concrete suggestions? Make the tree iterable? Add all element methods to the tree, implicitly forwarding to the root? Improve documentation?
msg215259 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-03-31 19:49
> Do you have concrete suggestions? Make the tree iterable? > Add all element methods to the tree, implicitly forwarding to the root? Yes, that is the feature request. Add all the element methods to the elementtree object. Implicitly forwarding to the root would be a reasonable way to do it, but that is just an implementation detail.
msg215358 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-04-02 06:01
> Add all the element methods to the elementtree object. Ok, but why? An ElementTree object is not an Element. It's a representation of a document that has a root Element. It makes sense for a document to allow searches over its content, and the ElementTree class currently supports that, using the find*() or iter() methods. They are "deep" or "global" content accessor shortcuts, in addition to the path through the normal getroot() method. But I can't see how making ElementTree objects look and behave like their own root Element improves anything. Instead, it would just make the distinction between the two completely unclear, and would also lead to quirks like the question why iterating over a document yields the second level of children. Or the question what the "attrib" property of a document could mean. Instead of blurring it, would you have an idea what we could improve in the documentation to make this distinction clearer?
msg215360 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-04-02 07:22
> ElementTree class currently supports that, > using the find*() or iter() methods. That would be great except that ElementTree doesn't actually have an __iter__ method. > Ok, but why? The short answer is that every time I conduct Python training, people routinely trip over this issue. The whole point of the ElementTree package was to have a more pythonic interface than DOM or SAX. I'm sure there are people that argue that the requests module isn't great because it conflates requesting with authentication and password management, but the beauty of requests is that its API matches how people try to use it. The outer ElementTree object is awkward in this regard. I don't see any benefit from having this code fail: from xml.etree.ElementTree import parse catalog = parse('books.xml') for book in catalog: print book.get('id')
msg215361 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-04-02 07:45
> I don't see any benefit from having this code fail: > > from xml.etree.ElementTree import parse > > catalog = parse('books.xml') > for book in catalog: > print book.get('id') Why would you expect it to work? And how? Why would it only iterate over the children of the root Element that it wraps, and not yield the root Element itself, and maybe any preceding or following processing instructions or comments, the doctype declaration, etc.?
msg215382 - (view)	Author: Eli Bendersky (eli.bendersky) *	Date: 2014-04-02 13:32
> Do you have concrete suggestions? Make the tree iterable? > > Add all element methods to the tree, implicitly forwarding to the root? > > Yes, that is the feature request. Add all the element methods to the > elementtree object. > > Implicitly forwarding to the root would be a reasonable way to do it, but > that is just an implementation detail. Porting over all methods of Element to ElementTree sounds like an overkill to me. How about just making a sensibly-behaving __iter__ for ElementTree? This should be easy because ElementTree already has a iter() method that behaves as needed (goes over all elements including root). Would iteration + perhaps clearer documentation solve most of the problem?
msg215383 - (view)	Author: Stefan Behnel (scoder) *	Date: 2014-04-02 14:02
> How about just making a sensibly-behaving __iter__ for ElementTree? Well, the problem is to determine what "sensibly-behaving" is. I can see three options. 1) tree.iter() == tree.getroot().iter() 2) iter(tree.getroot()) 3) iter([tree.getroot()]) The second option feels plain wrong to me. The last one would allow the extension towards PI/comment siblings, as I described before. There isn't currently a way to get at them (which doesn't hurt, because ET doesn't currently even pass them through from its parser, as discussed in issue 9521). Once there is a way in ET to parse them in (as in lxml), making ElementTree objects iterable would nicely solve the issue of how to process them afterwards. It's not the only solution for that problem, though, adding a ".gettoplevel()" method would similarly work. Thus, either 1) or 3) would fit the API, with the downside of 1) being that it's just completely redundant functionality and I don't consider saving 7 simple characters worth the increase in API overhead. That leaves 3) as an option. It's nice because the iteration then works on the same axis as for Elements, so x.iter() and iter(x) would behave in the same way for both classes.
msg216634 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-04-17 01:26
Given Stephan's concerned, I withdraw this feature request.

History
Date	User	Action	Args
2022-04-11 14:58:00	admin	set	github: 65227
2014-04-17 01:26:21	rhettinger	set	status: open -> closed resolution: not a bug messages: + msg216634
2014-04-02 14:02:06	scoder	set	messages: + msg215383
2014-04-02 13:32:10	eli.bendersky	set	messages: + msg215382
2014-04-02 07:45:49	scoder	set	messages: + msg215361
2014-04-02 07:22:19	rhettinger	set	messages: + msg215360
2014-04-02 06:01:59	scoder	set	messages: + msg215358
2014-03-31 19:49:46	rhettinger	set	messages: + msg215259
2014-03-31 13:39:28	eli.bendersky	set	messages: + msg215238
2014-03-23 07:34:21	scoder	set	nosy: + scoder, eli.bendersky messages: + msg214550
2014-03-23 03:53:28	martin.panter	set	nosy: + martin.panter title: ElementTree objects should support all the same methods are Element objects -> ElementTree objects should support all the same methods as Element objects
2014-03-22 22:47:56	rhettinger	create