This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.etree.ElementTree: get() doesn't return default value, always ATTLIST value
Type: behavior Stage: resolved
Components: XML Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, jacobtylerwalls, padremayi, scoder
Priority: normal Keywords:

Created on 2022-02-19 11:31 by padremayi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg413543 - (view) Author: (padremayi) Date: 2022-02-19 11:31
XML test file:

<?xml version="1.0"?>
<!DOCTYPE main [
<!ELEMENT main (object+)>
    <!ELEMENT object (description, year, manufacturer)>
        <!ATTLIST object name CDATA #REQUIRED>
        <!ATTLIST object works (yes|no) "yes">
        <!ELEMENT description (#PCDATA)>
        <!ELEMENT year (#PCDATA)>
        <!ELEMENT manufacturer (#PCDATA)>
]>

<main>
    <object name="My object">
        <description>This is a simple object</description>
        <year>2022</year>
        <manufacturer>Myself</manufacturer>
    </object>
</main>


Python code:
import xml.etree.ElementTree


try:
    xml_data = xml.etree.ElementTree.iterparse("test.xml", events=("start", "end"))

    for event, xml_tag in xml_data:
        if event == "end" and xml_tag.tag == "object":
            object_name = xml_tag.get("name")
            object_description = xml_tag.find("description").text
            works = xml_tag.get("works", default="foo")

            print("works value: " + str(works))

            xml_tag.clear()

    print("Done!")

except (NameError, xml.etree.ElementTree.ParseError):
    print("XML error!")


Output:
works value: yes
Done!


Expected behaviour:
works value: foo
Done!
msg413706 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2022-02-22 13:26
The question here is simply, which is considered more important: the default provided by the document, or the default provided by Python. I don't think it's a clear choice, but the way it is now does not seem unreasonable. Changing it would mean deliberate breakage of existing code that relies on the existing behaviour, and I do not see a reason to do that.
msg413780 - (view) Author: (padremayi) Date: 2022-02-23 09:51
IMHO if the developer doesn't manage the XML itself it is VERY unreasonable to use the document value and not the developer one. At the moment the developer must predict the future changes on XML structure.

For my point of view if an attribute is not present get() must return None (or the default value passed by developer) AND the document default adding an optional parameter to get() call: if True return 2 values, otherwise return the document one (current behaviour).

In this way the old code continue to work
msg413782 - (view) Author: (padremayi) Date: 2022-02-23 10:34
Now:
def get(self, key, default=None)

Future:
def get(self, key, default=None, double_value=False)

No code break
msg413785 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2022-02-23 11:49
> IMHO if the developer doesn't manage the XML itself it is VERY unreasonable to use the document value and not the developer one.

I disagree. If the document says "this is the default if no explicit value if given", then I consider that just as good as providing a value each time. Meaning, the attribute *is* in fact present, just not explicitly spelled out on the element.

I would specifically like to avoid adding a new option just to override the way the document distributes its attribute value spelling across DTD and document structure. In particular, the .get() method is the wrong place to deal with this.

You can probably configure the parser to ignore the internal DTD subset, if that's what you want.
msg414556 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2022-03-05 01:24
I agree not a bug. To ignore the document default you can set `specified_attributes` on the parser as documented:

https://docs.python.org/3/library/pyexpat.html#xml.parsers.expat.xmlparser.specified_attributes

Also, this was explicitly worked on recently in bpo-42151, so hard to imagine reversing course so soon. I suggest the issue be re-closed.
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90954
2022-03-05 12:28:49scodersetstatus: open -> closed
2022-03-05 01:24:29jacobtylerwallssetnosy: + jacobtylerwalls
messages: + msg414556
2022-02-23 11:49:25scodersetmessages: + msg413785
2022-02-23 10:34:17padremayisetmessages: + msg413782
2022-02-23 09:53:43padremayisetstatus: closed -> open
2022-02-23 09:51:57padremayisetmessages: + msg413780
2022-02-22 13:26:21scodersetstatus: open -> closed
resolution: not a bug
messages: + msg413706

stage: resolved
2022-02-21 21:06:24ned.deilysetnosy: + scoder, eli.bendersky
2022-02-19 11:31:31padremayicreate