Issue 46798: xml.etree.ElementTree: get() doesn't return default value, always ATTLIST value

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90954

classification

Title:	xml.etree.ElementTree: get() doesn't return default value, always ATTLIST value
Type:	behavior	Stage:	resolved
Components:	XML	Versions:	Python 3.8

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	eli.bendersky, jacobtylerwalls, padremayi, scoder
Priority:	normal	Keywords:

Created on 2022-02-19 11:31 by padremayi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg413543 - (view)	Author: (padremayi)	Date: 2022-02-19 11:31
XML test file: <?xml version="1.0"?> <!DOCTYPE main [ <!ELEMENT main (object+)> <!ELEMENT object (description, year, manufacturer)> <!ATTLIST object name CDATA #REQUIRED> <!ATTLIST object works (yes\|no) "yes"> <!ELEMENT description (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT manufacturer (#PCDATA)> ]> <main> <object name="My object"> <description>This is a simple object</description> <year>2022</year> <manufacturer>Myself</manufacturer> </object> </main> Python code: import xml.etree.ElementTree try: xml_data = xml.etree.ElementTree.iterparse("test.xml", events=("start", "end")) for event, xml_tag in xml_data: if event == "end" and xml_tag.tag == "object": object_name = xml_tag.get("name") object_description = xml_tag.find("description").text works = xml_tag.get("works", default="foo") print("works value: " + str(works)) xml_tag.clear() print("Done!") except (NameError, xml.etree.ElementTree.ParseError): print("XML error!") Output: works value: yes Done! Expected behaviour: works value: foo Done!
msg413706 - (view)	Author: Stefan Behnel (scoder) *	Date: 2022-02-22 13:26
The question here is simply, which is considered more important: the default provided by the document, or the default provided by Python. I don't think it's a clear choice, but the way it is now does not seem unreasonable. Changing it would mean deliberate breakage of existing code that relies on the existing behaviour, and I do not see a reason to do that.
msg413780 - (view)	Author: (padremayi)	Date: 2022-02-23 09:51
IMHO if the developer doesn't manage the XML itself it is VERY unreasonable to use the document value and not the developer one. At the moment the developer must predict the future changes on XML structure. For my point of view if an attribute is not present get() must return None (or the default value passed by developer) AND the document default adding an optional parameter to get() call: if True return 2 values, otherwise return the document one (current behaviour). In this way the old code continue to work
msg413782 - (view)	Author: (padremayi)	Date: 2022-02-23 10:34
Now: def get(self, key, default=None) Future: def get(self, key, default=None, double_value=False) No code break
msg413785 - (view)	Author: Stefan Behnel (scoder) *	Date: 2022-02-23 11:49
> IMHO if the developer doesn't manage the XML itself it is VERY unreasonable to use the document value and not the developer one. I disagree. If the document says "this is the default if no explicit value if given", then I consider that just as good as providing a value each time. Meaning, the attribute is in fact present, just not explicitly spelled out on the element. I would specifically like to avoid adding a new option just to override the way the document distributes its attribute value spelling across DTD and document structure. In particular, the .get() method is the wrong place to deal with this. You can probably configure the parser to ignore the internal DTD subset, if that's what you want.
msg414556 - (view)	Author: Jacob Walls (jacobtylerwalls) *	Date: 2022-03-05 01:24
I agree not a bug. To ignore the document default you can set `specified_attributes` on the parser as documented: https://docs.python.org/3/library/pyexpat.html#xml.parsers.expat.xmlparser.specified_attributes Also, this was explicitly worked on recently in bpo-42151, so hard to imagine reversing course so soon. I suggest the issue be re-closed.

History
Date	User	Action	Args
2022-04-11 14:59:56	admin	set	github: 90954
2022-03-05 12:28:49	scoder	set	status: open -> closed
2022-03-05 01:24:29	jacobtylerwalls	set	nosy: + jacobtylerwalls messages: + msg414556
2022-02-23 11:49:25	scoder	set	messages: + msg413785
2022-02-23 10:34:17	padremayi	set	messages: + msg413782
2022-02-23 09:53:43	padremayi	set	status: closed -> open
2022-02-23 09:51:57	padremayi	set	messages: + msg413780
2022-02-22 13:26:21	scoder	set	status: open -> closed resolution: not a bug messages: + msg413706 stage: resolved
2022-02-21 21:06:24	ned.deily	set	nosy: + scoder, eli.bendersky
2022-02-19 11:31:31	padremayi	create