classification
Title: xml.etree.ElementTree.Element.__eq__ does compare only objects identity
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Marco Sulla, eli.bendersky, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-08-08 09:57 by Marco Sulla, last changed 2019-08-22 23:30 by Marco Sulla. This issue is now closed.

Messages (10)
msg349230 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-08-08 09:57
Currectly, even if two `Element`s elem1 and elem2 are different objects but the tree is identical, elem1 == elem2 returns False. The only effective way to compare two `Element`s is

ElementTree.tostring(elem1) == ElementTree.tostring(elem2)

Furthermore, from 3.8 this could be not true anymore, since the order of insertion of attributes will be preserved. So if I simply wrote a tag with two identical attributes, but with different order, the trick will not work anymore.

Is it so much complicated to implement an __eq__ for `Element` that traverse its tree?

PS: some random remarks about xml.etree.ElementTree module:

1. why `fromstring` and `fromstringlist` separated functions? `fromstring` could use duck typing for the main argument, and `fromstringlist` deprecated.

2. `SubElement`: why the initial is a capital letter? It seems the constructor of a different class, while it's a factory function. I'll change it to `subElement` and deprecate `SubElement`
msg349246 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-08 18:34
By default, all objects compare based solely on identity.

Are you making a feature request for Element objects to grow a recursive equality test that includes attributes regardless of order and disregards processing instructions and comments?

What is you principal use case?
msg349348 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-08-10 18:31
FWIW, deep traversing an XML tree on an operation as simple as "==" seems excessive. To me, object identity comparison seems the most sensible behaviour of "==" on Element objects.

(It's not "complicated to implement", but rather can be very expensive to execute.)

Regarding your other questions (and note that this is a bug tracker, so discussing unrelated questions in a ticket is inappropriate – use the Python mailing list instead if you want):

"SubElement" suggests a constructor, yes. It kind-of makes sense, given what it does, and resembles "Element", which is the constructor for a (non-sub) Element. It might seem funny, sure, but on the other hand, why should users be bothered with the implementation detail that it is a function? :-)

"fromstringlist()" matches "tostringlist()", API-wise. Both are probably not very widely used, but I don't see much value in removing them. It always breaks someone's code out there.
msg349351 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-10 18:38
In some applications the order of attributes matters, and in others it does not. So the equality check is application dependent.
msg349352 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-08-10 18:46
Right. If you want to compare XML trees for equality, either write your own deep-tree comparison function, or use something like doctestcompare, or use a C14N serialisation (which is now part of Py3.8). Whichever suits your needs.

https://github.com/lxml/lxml/blob/master/src/lxml/doctestcompare.py

https://docs.python.org/3.8/library/xml.etree.elementtree.html#xml.etree.ElementTree.canonicalize
msg350115 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-08-21 21:51
@scoder: 

1. the fact that == does not traverse the Element is IMHO unpythonic and non-standard. A trivial example:

>>> a = {1: {2: 3}}
>>> b = {1: {2: 3}}
>>> a == b
True

You can have a dictionary complicated as you want, but if they have the same structure, the two dictionaries will be always equals, even if their id are not. 

I think that no one could say to remove this dictionary feature and simply check the ids, leaving the deep comparison to the user, without raising a rebellion :)

2. the fact that SubElement seems a constructor is not an implementation detail. It's misleading and confusing, since a programmer expects that it will return an object of type SubElement, while there's no SubElement class. This is peculiar. I can be wrong, but I never encountered such a bizarre naming in the standard library. IMHO SubElement should be deprecated and it should call `subElement()`, a simply copy of the old SubElement

3. I'm not suggesting to remove fromstringlist and tostringlist, but that they could be deprecated and simply call fromstring and tostring, that should use duck typing for doing what fromstringlist and tostringlist did.
msg350118 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-21 22:12
Macro, we appreciate your sentiments.  Please consider this module has been around for a long time and that others aren't reacting to the API the same way you are.  Deprecating SubElement, fromstringlist() and tostringlist() because you don't like them will just cause disruption to existing, deployed code.  You're about 15 years too late for a design discussion ;-)  That ship has sailed. 

Your proposal to add a new feature for comparing elements is in the realm of the possible.  That said, the other respondents made a reasonable case the different people would want to do it differently.  Despite your insistence that only your way makes sense, we do have to consider other users as well.  The other respondents provided you with other ways to meet your needs. Their disinclination to not add this feature is backed-up by years of experience with this module and with lxml.

Communication note: Please do not go down of the path of making yourself the arbiter of what is Pythonic or standard.  The other core devs in this conversation are highly experienced.  Insulting them or the Fredrik Lundh's existing API won't help matters.

We should add a clear note to the docs:  "Comparing the string serialized XML should not be used to establish semantic equality.  The preferred ways are to use C14N canonicalization or to write a tree walker where the notion of equivalence can be customized to include/exclude attribute order, to include/exclude comments or processing instructions, to include/exclude whitespace at non-leaf nodes".
msg350160 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-08-22 06:04
@rhettinger:

"Deprecating [...] just cause disruption to existing, deployed code"

How? Deprecating is used just to maintain intact the already existing code...

"Please do not go down of the path of making yourself the arbiter of what is Pythonic or standard.  The other core devs in this conversation are highly experienced.  Insulting them or the Fredrik Lundh's existing API won't help matters"

I'm not insulting anyone, I just said *IMHO* it's not pythonic. 

I think the example of a tree created with a simple dictionary is a clear signal that Python, in the Guido's mind, was created with the intention that equality should check the content of the objects and not just the ids, as Java, for example, does, even for objects that must be traversed to see if they are equal to another one.

The fact you can check if two objects are equal using simply == is, _IMHO_, more elegant, simple and useful. The fact that == checks the ids is not useful at all, since I can do it with id(elem1) == id(elem2). So what's the purpose of == ?
msg350167 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-22 07:10
Closed since this issue contains several unrelated propositions, most of which have been rejected.

If you want to add helper functions for comparing Elements (shallow and deep, with and without taking and order of attributes to account, with and without ignoring whitespaces, etc), feel free to open a separate issue.
msg350242 - (view) Author: Marco Sulla (Marco Sulla) Date: 2019-08-22 23:30
Thanks, but telling the truth:

1. I just not use SubElement, even if it's more convenient. I just create an Element and I append to the parent one. It's much more clear IMHO

2. I do not use `fromstring` and all its friends. It was just a suggestion

3. I already copy/pasted from SO a function that serialize the Element. I do not want to waste time to do something that will be not used as `Element.__eq__()` implementation, as IMHO should be. 

See ya.
History
Date User Action Args
2019-08-22 23:30:48Marco Sullasetmessages: + msg350242
2019-08-22 07:10:25serhiy.storchakasetstatus: open -> closed
resolution: rejected
messages: + msg350167

stage: resolved
2019-08-22 06:04:02Marco Sullasetmessages: + msg350160
2019-08-21 22:12:20rhettingersetmessages: + msg350118
2019-08-21 21:51:45Marco Sullasetmessages: + msg350115
2019-08-10 18:46:32scodersetmessages: + msg349352
2019-08-10 18:38:34serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg349351
2019-08-10 18:31:26scodersetmessages: + msg349348
2019-08-08 18:34:00rhettingersetnosy: + rhettinger
messages: + msg349246
2019-08-08 10:34:13xtreaksetnosy: + scoder, eli.bendersky
2019-08-08 09:57:04Marco Sullacreate