classification
Title: In xml.etree.ElementTree findall() can't search all elements in a namespace
Type: behavior Stage: needs patch
Components: Library (Lib), XML Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, py.user, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-09-21 12:38 by py.user, last changed 2019-04-15 18:51 by scoder.

Messages (2)
msg277130 - (view) Author: py.user (py.user) * Date: 2016-09-21 12:38
In the example there are two namespaces in one document, but it is impossible to search all elements only in one namespace:

>>> import xml.etree.ElementTree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element '{http://def}a' at 0xb73961bc>, <Element '{http://x}b' at 0xb7396c34>]
>>>
>>> root.findall('{http://def}*')
[]
>>>


And same try with site package lxml works fine:

>>> import lxml.etree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element {http://def}a at 0xb70ab11c>, <Element {http://x}b at 0xb70ab144>]
>>>
>>> root.findall('{http://def}*')
[<Element {http://def}a at 0xb70ab11c>]
>>>
msg340301 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-15 18:51
lxml has a couple of nice features here:

- all tags in a namespace: "{namespace}*"
- a local name 'tag' in any (or no) namespace: "{*}tag"
- a tag without namespace: "{}tag"
- all tags without namespace: "{}*"

"{*}*" is also accepted but is the same as "*". Note that "*" is actually allowed as an XML tag name by the spec, but rare enough to hijack it for this purpose. I've actually never seen it used anywhere in the wild.

lxml's implementation isn't applicable to ElementTree (searching has been subject to excessive optimisation), but it shouldn't be hard to extend the one in ET's ElementPath.py module, as well as Element.iter() in ElementTree.py, to support this kind of tag comparison.

PR welcome.

lxml's tests are here (and in the following test methods):

https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/tests/test_etree.py#L2911

Note that they actually test the deprecated .getiterator() method for historical reasons. They should probably call .iter() instead these days. lxml's ElementPath implementation is under src/lxml/_elementpath.py, but the tag comparison itself is done elsewhere in Cython code (here, in case it matters:)

https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/apihelpers.pxi#L921-L1048
History
Date User Action Args
2019-04-15 18:51:42scodersetmessages: + msg340301
stage: needs patch
2019-04-15 16:00:07xtreaksetnosy: + scoder, eli.bendersky, serhiy.storchaka

versions: + Python 3.8, - Python 3.6
2016-09-21 12:38:12py.usercreate