Issue42893
Created on 2021-01-11 17:12 by robpats, last changed 2021-01-15 01:53 by robpats.
Messages (3) | |||
---|---|---|---|
msg384851 - (view) | Author: (robpats) | Date: 2021-01-11 17:12 | |
Python 3.6.8 / 3.7.9 / 3.8.7 >>> import xml.etree.ElementTree >>> e = xml.etree.ElementTree.fromstring('<html><div class="row"/><hr/><div/><hr/><div class="row"/><button/></html>') >>> list(e) [<Element 'div' at 0x00000000024CD220>, <Element 'hr' at 0x00000000024CD2C0>, <Element 'div' at 0x00000000024F90E0>, <Element 'hr' at 0x00000000024F9130>, <Element 'div' at 0x00000000024F9180>, <Element 'button' at 0x00000000024F91D0>] >>> e.find("./div[1]") <Element 'div' at 0x00000000024CD220> >>> e.find("./div[2]") <Element 'div' at 0x00000000024F90E0> >>> e.find("./div[3]") <Element 'div' at 0x00000000024F9180> >>> e.find("./hr[1]") <Element 'hr' at 0x00000000024CD2C0> >>> e.find("./hr[2]") <Element 'hr' at 0x00000000024F9130> # The following different from XPath implementation in Firefox # https://developer.mozilla.org/en-US/docs/Web/XPath/Snippets >>> list(e.iterfind("./*")) [<Element 'div' at 0x00000000024CD220>, <Element 'hr' at 0x00000000024CD2C0>, <Element 'div' at 0x00000000024F90E0>, <Element 'hr' at 0x00000000024F9130>, <Element 'div' at 0x00000000024F9180>, <Element 'button' at 0x00000000024F91D0>] >>> e.find("./*[1]") <Element 'div' at 0x00000000024CD220> >>> e.find("./*[2]") <Element 'div' at 0x00000000024F90E0> <-- should be 'hr', same as e.find("./div[2]") instead of e[2] >>> e.find("./*[3]") <Element 'div' at 0x00000000024F9180> <-- same as e.find("./div[3]") instead of e[3] >>> e.find("./*[4]") >>> list(e.iterfind("./*[@class='row']")) [<Element 'div' at 0x00000000024CD220>, <Element 'div' at 0x00000000024F9180>] >>> e.find("./*[@class='row'][1]") <Element 'div' at 0x00000000024CD220> >>> e.find("./*[@class='row'][2]") >>> e.find("./*[@class='row'][3]") <Element 'div' at 0x00000000024F9180> <--- cannot find element at [2] but found at [3] |
|||
msg385011 - (view) | Author: Christian Heimes (christian.heimes) * ![]() |
Date: 2021-01-13 10:32 | |
etree's find method supports a limited subset of XPath, https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax . e.find("./*[2]") seems to trigger undefined behavior. The limited XPath syntax for positions is documented as "position predicates must be preceded by a tag name". lxml behaves the same. Its find() method returns the same value and its xpath() method your expected value: >>> import lxml.etree >>> e = lxml.etree.fromstring('<html><div class="row"/><hr/><div/><hr/><div class="row"/><button/></html>') >>> e.find("./*[2]") <Element div at 0x7fe4d777b6c0> >>> e.xpath("./*[2]") [<Element hr at 0x7fe4d777b2c0>] |
|||
msg385094 - (view) | Author: (robpats) | Date: 2021-01-15 01:53 | |
Thanks for the pointer. I didn't notice this paragraph. xml.etree.ElementTree.Element.find currently returns None if XPath expression is invalid or unsupported. I think it should also return None if position predicates are not preceded by a tag name. It would be even better to emit warnings or raise exceptions to indicate any errors. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2021-01-15 01:53:49 | robpats | set | messages: + msg385094 |
2021-01-13 10:32:02 | christian.heimes | set | nosy:
+ christian.heimes, scoder messages: + msg385011 |
2021-01-11 17:12:45 | robpats | create |