Title: xml.etree.ElementTree.Element inconsistent warning for bool
Type: behavior Stage: needs patch
Components: Library (Lib), XML Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: effbot, gvanrossum, philiprowlands, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-11-29 13:21 by philiprowlands, last changed 2019-12-03 00:21 by gvanrossum.

Messages (9)
msg357642 - (view) Author: Philip Rowlands (philiprowlands) Date: 2019-11-29 13:21
Steps to reproduce:

$ python3.7
Python 3.7.2 (default, May 13 2019, 13:52:56)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))

$ python3.7
Python 3.7.2 (default, May 13 2019, 13:52:56)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['_elementtree'] = None
>>> import xml.etree.ElementTree as ET
>>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))
__main__:1: FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.

This is (almost) the smallest test case, but in real code what I was trying to write was:
# check the result code is ok
if response.find("./result[@code='ok']"):
    return True

The unintuitive bool() behaviour was surprising, compared to other typical True / False determinations on objects, and I think what the FutureWarning is trying to avoid.

Please implement the same warning for bool(Element) in _elementtree.c as exists in bpo29204 was making similar alignments between the versions, but didn't consider this FutureWarning.
msg357644 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-11-29 17:00
Agreed that both should behave the same.

And, we should finally decide whether this should really be changed or not. The current behaviour is counter-intuitive, but it's been like that forever and lots of code depends on it in one way or another, so …
msg357645 - (view) Author: Philip Rowlands (philiprowlands) Date: 2019-11-29 17:14
It's easier to justify a change in behaviour if the warning is emitted. With no legacy concerns, I would be happy for bool() to change, but I'm not the one who would receive the grumbly tickets.

How about emitting the warning in the next release, then assessing the behaviour change for version n+2?
msg357662 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-11-30 21:05
> And, we should finally decide whether this should really be 
> changed or not. The current behaviour is counter-intuitive,
> but it's been like that forever and lots of code depends on 
> it in one way or another, so …

You make a good case to just leave it be.

+1 from me to leave as-is.
msg357680 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-12-01 14:15
Not so quick. :) You're probably aware of the details, but still, let me state clearly what this is about, to make sure that we're all on the same page here.
(I'm also asking in Serhiy, because he did quite some work on ET in the past and has experience with it.)

The Element class in ElementTree represents an XML tag, but it also mimics a sequence container of child elements. Its behaviour is quite list-like, in that it supports indexing and slicing of its children "list", and also has an "append()" method to add one at the end. All of this is a nice and simple API, so what's the catch?

The gotcha is the behaviour of "__bool__". Following the sequence protocol, the truth value of an Element depends on its children. No children – false. Has children, true. While this makes sense from the sequence protocol side, I don't think it is what users expect. In Python, it is very common to say "if something: …" in order to test if "something" is "a thing". When doing this with Elements, users usually don't think of the element's children but the Element itself, often because they got it back from from kind of tree search, as in the OP's example using ".find()".

I would argue that this is brittle, because it works when testing with Elements that happen to have children, but it fails when the Element that was found has no children. The intention of the "FutureWarning" was to allow changing the behaviour at some point to always return true as the truth value, i.e. to remove the "__bool__" implementation all together.

1) fixes the behaviour for broken code that intended to test if it "is a thing"
2) does not change the behaviour when testing Elements (for whatever reason) that have children
3) returns true instead of false when truth-testing Elements that have no children, whether the code originally expected this or not.

The main question now is: what is more common? Code that only accidentally works because it didn't get in touch with child-free (leaf-)elements yet? Or code that would start failing with this change because it really wants to find out if an Element has children by truth-testing it.

I'm sure that both exist, because I have seen both, and my very limited *feeling* from what I've seen is that there's more brittle code out there than intentional code that truth-tests Elements, and that future users are also more likely to get it wrong until they get bitten by it and learn the actual meaning of the truth-test.

But, we don't know, and this isn't something to easily find out via a code search. I'm personally more leaning towards changing the behaviour to help new users, but lacking enough evidence, not changing something long-standing is always the safer choice. I'm aware that I will have to take the "final" decision in the end, but would be happy to hear some more intuitions, ideas or even data points first.
msg357682 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-12-01 17:32
Stefan's arguments looked reasonable to me. I supported deprecating Element.__bool__. This was an old plan, although we failed with deprecation in the C implementation (this is not the first oversight in deprecations in ElementTree).

But there is a problem: removing __bool__ will not change the value of bool(element). It will still return False for an Element without children, because bool() falls back to __len__ if __bool__ is not defined, and Element.__len__ is defined. So we will need either to define Element.__bool__ always returning True, and this will be the first case when __bool__ and __len__ return inconsistent values, or to deprecate __len__ (and perhaps __getitem__) that goes too far from the initial plan.

So now I am not sure. Making Element.__bool__ always returning True or raising an exception could help to fix some bugs in user code. But this differs from common behavior of sequence-like types in Python. On other hand, we have a precedence: bool(NotImplemented) raises an exception now (this also can save from common errors).

When I started to write this message I was -0 for changing Element.__bool__. Now I am +0, after considering all arguments that came in my mind, but still have doubts. Can we ask some of the language ideologues?
msg357688 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-01 19:20
I'm +1 on adding the deprecation and changing it to an error in the future.

I think it's an anti-pattern that objects with complex behavior that includes some sequence behavior allow the fallback behavior for checking the empty sequence. I recall being bitten by this nearly 20 years ago (at Zope) when some database class used this pattern -- somehow the app ended up thinking there was no database at all when in fact there was one but it was empty, and it reinitialized the database with bad results.

The fallback from truth testing to __len__() is built into the language and cannot be changed, but we *can* override __bool__() to raise an exception, and I think that's what we should do. (Clearly that's also what we were planning to do, hence the warning in the Python version.)
msg357737 - (view) Author: Philip Rowlands (philiprowlands) Date: 2019-12-03 00:05
I went digging through the archives, made more interesting as elementtree was imported into the standard library.

AFAICT, the FutureWarning for __bool__ (or __nonzero__ in py2) appeared circa 2007-06 in version 1.3a2:

- Added future warnings for "if e" (use explicit len(e) or is None
  test) and "e.getchildren()" (use list(e) or iteration).

The docs are still there, warning about the pitfalls and suggesting "This behaviour is likely to change somewhat in ElementTree 1.3."

12 years on, it has not, but what was the intended change (+effbot for possible context)??

I assumed on first reading that the plan was to switch bool(Element) to return True, but others have pointed out the inconsistency with len(), or having __bool__() raise an exception.

My 2c would be to steer devs away from using bool(Element) as a final step, i.e. keep the existing True/False definition, warts and all, but raise a SyntaxWarning (RuntimeWarning?) with a similar message to the current FutureWarning.
msg357740 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-03 00:21
I doubt effbot will respond. :-)

I think the first order of business is to add the same FutureWarning to the C version of the code, and to reset the clock on the deprecation. 

In terms of the final stages, why wouldn't we want it to just always raise? Just because there's so much code that might break?

It shouldn't be a SyntaxWarning, since this is most definitely not a syntax issue. I'd be okay with RuntimeError (or RuntimeWarning if we decide to keep the old outcome with a warning forever).

(Huh, does FutureWarning count as a deprecation?)
Date User Action Args
2019-12-03 00:21:11gvanrossumsetmessages: + msg357740
2019-12-03 00:05:20philiprowlandssetnosy: + effbot
messages: + msg357737
2019-12-01 19:20:28gvanrossumsetmessages: + msg357688
2019-12-01 17:32:09serhiy.storchakasetnosy: + gvanrossum
messages: + msg357682
2019-12-01 14:15:33scodersetnosy: + serhiy.storchaka
messages: + msg357680
2019-11-30 21:05:39rhettingersetnosy: + rhettinger
messages: + msg357662
2019-11-29 17:14:26philiprowlandssetmessages: + msg357645
2019-11-29 17:00:53scodersetstage: needs patch
messages: + msg357644
versions: + Python 3.9
2019-11-29 13:47:07xtreaksetnosy: + scoder
components: + XML
2019-11-29 13:21:05philiprowlandscreate