In xml.etree.ElementTree findall() can't search all elements in a namespace #72425

py-user · 2016-09-21T12:38:13Z

BPO	28238
Nosy	@scoder, @py-user, @serhiy-storchaka
PRs	bpo-28238: Implement "{}tag" and "{ns}" wildcard tag selection support for ElementPath #12997

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/scoder'
closed_at = <Date 2019-05-03.18:59:05.111>
created_at = <Date 2016-09-21.12:38:12.968>
labels = ['expert-XML', '3.8', 'type-feature', 'library']
title = "In xml.etree.ElementTree findall() can't search all elements in a namespace"
updated_at = <Date 2019-05-03.18:59:05.110>
user = 'https://github.com/py-user'

bugs.python.org fields:

activity = <Date 2019-05-03.18:59:05.110>
actor = 'scoder'
assignee = 'scoder'
closed = True
closed_date = <Date 2019-05-03.18:59:05.111>
closer = 'scoder'
components = ['Library (Lib)', 'XML']
creation = <Date 2016-09-21.12:38:12.968>
creator = 'py.user'
dependencies = []
files = []
hgrepos = []
issue_num = 28238
keywords = ['patch']
message_count = 5.0
messages = ['277130', '340301', '341030', '341043', '341351']
nosy_count = 4.0
nosy_names = ['scoder', 'eli.bendersky', 'py.user', 'serhiy.storchaka']
pr_nums = ['12997']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue28238'
versions = ['Python 3.8']

py-user · 2016-09-21T12:38:13Z

In the example there are two namespaces in one document, but it is impossible to search all elements only in one namespace:

>>> import xml.etree.ElementTree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element '{http://def}a' at 0xb73961bc>, <Element '{http://x}b' at 0xb7396c34>]
>>>
>>> root.findall('{http://def}*')
[]
>>>

And same try with site package lxml works fine:

>>> import lxml.etree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element {http://def}a at 0xb70ab11c>, <Element {http://x}b at 0xb70ab144>]
>>>
>>> root.findall('{http://def}*')
[<Element {http://def}a at 0xb70ab11c>]
>>>

scoder · 2019-04-15T18:51:42Z

lxml has a couple of nice features here:

all tags in a namespace: "{namespace}*"
a local name 'tag' in any (or no) namespace: "{*}tag"
a tag without namespace: "{}tag"
all tags without namespace: "{}*"

"{}" is also accepted but is the same as "*". Note that "*" is actually allowed as an XML tag name by the spec, but rare enough to hijack it for this purpose. I've actually never seen it used anywhere in the wild.

lxml's implementation isn't applicable to ElementTree (searching has been subject to excessive optimisation), but it shouldn't be hard to extend the one in ET's ElementPath.py module, as well as Element.iter() in ElementTree.py, to support this kind of tag comparison.

PR welcome.

lxml's tests are here (and in the following test methods):

https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/tests/test_etree.py#L2911

Note that they actually test the deprecated .getiterator() method for historical reasons. They should probably call .iter() instead these days. lxml's ElementPath implementation is under src/lxml/_elementpath.py, but the tag comparison itself is done elsewhere in Cython code (here, in case it matters:)

https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/apihelpers.pxi#L921-L1048

scoder · 2019-04-28T18:15:08Z

PR submitted, feedback welcome.

scoder · 2019-04-29T05:31:03Z

BTW, I found that lxml and ET differ in their behaviour when searching for '*'. ET takes it as meaning "any tree node", whereas lxml interprets it as "any Element". Since ET's parser does not create comments and processing instructions by default, this does not make a difference in most cases, but when the tree contains comments or PIs, then they will be found by '*' in ET but not in lxml.

At least for "{}", they now both return only Elements. Changing either behaviour for '*' is probably not a good idea at this point.

scoder · 2019-05-03T18:58:21Z

New changeset 4754168 by Stefan Behnel in branch 'master':
bpo-28238: Implement "{}tag" and "{ns}" wildcard tag selection support for ElementPath, and extend the surrounding tests and docs. (GH-12997)
4754168

sleepyhollo · 2023-03-22T17:08:57Z

Is this an issue with Python's xml module or something specific to CPython? I am having this with the xml module right now

scoder · 2023-03-24T07:53:36Z

Is this an issue with Python's xml module or something specific to CPython? I am having this with the xml module right now

This feature was added to the xml.etree package in the standard library of Python 3.8. It's not specific to CPython, all Python implementations that use the same standard library (3.8) module here should have the same features.

py-user mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-XML labels Sep 21, 2016

tirkarthi added the 3.8 only security fixes label Apr 15, 2019

scoder self-assigned this Apr 28, 2019

scoder added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Apr 28, 2019

scoder closed this as completed May 3, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

scoder mentioned this issue Sep 29, 2023

ElementTree -- provide a way to ignore namespace in tags and searches #62504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In xml.etree.ElementTree findall() can't search all elements in a namespace #72425

In xml.etree.ElementTree findall() can't search all elements in a namespace #72425

py-user mannequin commented Sep 21, 2016

py-user mannequin commented Sep 21, 2016

scoder commented Apr 15, 2019

scoder commented Apr 28, 2019

scoder commented Apr 29, 2019

scoder commented May 3, 2019

sleepyhollo commented Mar 22, 2023

scoder commented Mar 24, 2023

In xml.etree.ElementTree findall() can't search all elements in a namespace #72425

In xml.etree.ElementTree findall() can't search all elements in a namespace #72425

Comments

py-user mannequin commented Sep 21, 2016

py-user mannequin commented Sep 21, 2016

scoder commented Apr 15, 2019

scoder commented Apr 28, 2019

scoder commented Apr 29, 2019

scoder commented May 3, 2019

sleepyhollo commented Mar 22, 2023

scoder commented Mar 24, 2023