classification
Title: Element.findall documentation misleading
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Eric S, eli.bendersky, martin.panter, rhettinger, scoder
Priority: normal Keywords: patch

Created on 2015-07-26 03:35 by Eric S, last changed 2019-08-23 00:19 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
etree_clarify.diff rhettinger, 2015-07-26 05:42
Messages (11)
msg247404 - (view) Author: Eric S (Eric S) Date: 2015-07-26 03:35
Documentation states:
Element.findall() finds only elements with a tag which are direct children of the current element.

More accurate to say "direct descendents" as "direct children" implies only one generation below whereas function goes down to all g...children. For-looping findall with inner recursion to rebuild hierarchy as dict, etc. can cause very large tree.
msg247405 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-26 04:58
The phrase occurs in https://docs.python.org/3/library/xml.etree.elementtree.html#finding-interesting-elements

I think it should say, "In this example, Element.findall() finds only elements with a tag which are direct children of the current element."

Later it accurately says that, "More sophisticated specification of which elements to look for is possible by using XPath."
msg247457 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-07-27 09:43
Agreed that it's misleading. Note that the section refers to the *following* example, not the one that readers just went through.

I would actually start that paragraph with the reference to XPath (because that's the difference to el.iter()), and then just present the example as the simplest case of such an expression.
msg247700 - (view) Author: Eric S (Eric S) Date: 2015-07-30 22:10
Pointing to XPath and clarifying the example reference are good ideas. 

For me though, the phrase "direct children" would still lead me to believe that findall() would only give me the first generation right below the element (e.g. only the countries in the example), not grandchildren, greatgrandchildren, etc. That's why I think "direct descendents" or "all descendents" would give a clearer picture.
msg247717 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-07-31 00:47
How about something like:

'''
:meth:`Element.findall` finds all elements matching a path expression; in this example, all <country> children of *root*. meth:`Element.find` finds the *first* matching element, in this example the first <rank> child. . . .
'''

Also, perhaps it would be good to change findall() to iterfind(), which is a newer API.
msg247738 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-07-31 06:21
Here's a complete drop-in replacement.

"""
More specific constraints on which elements to look for and where to look for them in the tree can be expressed in :ref:`XPath <elementtree-xpath>`.  :meth:`Element.iterfind` iterates over all elements matching such a path expression.  In the following example, it finds all direct <country> children of *root*.  :meth:`Element.find` provides a shortcut to find only the *first* matching element, in this example the first <rank> child.  Once an element is found, :attr:`Element.text` accesses the element's immediate text content and :meth:`Element.get` accesses the element's attributes.

::

   >>> for country in root.iterfind('country'):
   ...   rank = country.find('rank').text
   ...   name = country.get('name')
   ...   print(name, rank)
   ...
   Liechtenstein 1
   Singapore 4
   Panama 68
"""

Note that the reviewed doc patch in issue 24079 also hasn't been applied yet. It would help here.
msg247764 - (view) Author: Eric S (Eric S) Date: 2015-07-31 19:19
To my preference, the drop-in is verbose and I got a little confused on first read. The current documentation and example seem mostly OK to me. 

If we leave "children" as in "all <country> children of *root*", it doesn't illuminate the fact that if root had a great-great-grandchild also named <country>, then it would return that as well.

The only way I can think to simply clarify it is to use "direct descendents" or "all direct descendents".

Here's how the phrase "direct children" slipped me up when I first read the docs re:findall():

I thought I could rebuild the XML tree like this:

def rebuild_XML_as_whatever(parent)
    for child in parent.findall()
       code_that_attaches_child
       rebuild_XML_as_whatever(child)

instead I had to do this:

def rebuild_XML_as_whatever(parent)
    descendns = parent.findall("./")
    for child_n in range(len(descendns)):
        child = descendns[child_n]

        code_that_attaches_child
        rebuild_XML_as_whatever(child)        

RE:iterfind() if it fits a new format that's fine but renaming would, of course, interfere with backwards compatibility.
msg247767 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2015-07-31 19:57
Whenever you feel a need for using range(len(...)), you are almost always doing something wrong. The problem in your code is that you are modifying the tree while iterating over it. However, please ask this question on the mailing list as the bug tracker is not the right place to discuss this.

> If we leave "children" as in "all <country> children of *root*", it doesn't illuminate the fact that if root had a great-great-grandchild also named <country>, then it would return that as well.

No, it wouldn't. It behaves exactly as described. Thus the reference to the XPath section for further explanations.
msg247785 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-07-31 23:11
Eric: Calling findall("country") does _not_ return grandchidren nor further descendants. Also, your pseudocode calling findall() with no arguments does not work, so I am left wondering where you got the wrong impression about grandchildren. The usual way to get a list of direct children is to call list(parent):

>>> root = XML('''<data><country name="Leichtenstein">
...         <country name="Singapore"><country name="Panama"/></country>
...     </country></data>''')
>>> root.findall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Required argument 'path' (pos 1) not found
>>> list(root)  # List of direct children
[<Element 'country' at 0xb6ebe324>]
>>> root.findall("country")  # List of direct <country> children
[<Element 'country' at 0xb6ebe324>]
>>> root.findall(".//country")  # List of all <country> descendants
[<Element 'country' at 0xb6ebe324>, <Element 'country' at 0xb6ebe284>, <Element 'country' at 0xb6ebe0cc>]
msg247833 - (view) Author: Eric S (Eric S) Date: 2015-08-01 22:08
Code was intended as example, not request for help to correct, but rushed so example was flawed, but still, I tested and you both are right. Must've had other error in code to cause the xml to dict to have every element map to under every node. Debugger also showed findall and list returns with every descendent under every item as well so that influenced my triage.

Issue is closed as far as I'm concerned but it sounds like you all found something else here. 

Sorry for using up your time. Will be more careful and clear if there is a next time.
msg350243 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-23 00:19
> Issue is closed as far as I'm concerned 

Me too.
History
Date User Action Args
2019-08-23 00:19:44rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg350243

stage: resolved
2015-08-01 22:08:44Eric Ssetmessages: + msg247833
2015-07-31 23:11:22martin.pantersetmessages: + msg247785
2015-07-31 19:57:59scodersetmessages: + msg247767
2015-07-31 19:19:30Eric Ssetmessages: + msg247764
2015-07-31 06:21:24scodersettype: enhancement
messages: + msg247738
versions: + Python 2.7, Python 3.4, Python 3.5, Python 3.6
2015-07-31 00:47:36martin.pantersetnosy: + martin.panter
messages: + msg247717
2015-07-30 22:10:02Eric Ssetmessages: + msg247700
2015-07-27 09:43:28scodersetmessages: + msg247457
2015-07-26 05:42:14rhettingersetfiles: + etree_clarify.diff
keywords: + patch
2015-07-26 04:59:25rhettingersetnosy: + scoder, eli.bendersky
components: + Documentation
2015-07-26 04:58:52rhettingersetassignee: rhettinger

messages: + msg247405
nosy: + rhettinger
2015-07-26 03:35:13Eric Screate