Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementTree.Element inconsistent warning for bool #83122

Closed
philiprowlands mannequin opened this issue Nov 29, 2019 · 15 comments · Fixed by #31149
Closed

xml.etree.ElementTree.Element inconsistent warning for bool #83122

philiprowlands mannequin opened this issue Nov 29, 2019 · 15 comments · Fixed by #31149
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@philiprowlands
Copy link
Mannequin

philiprowlands mannequin commented Nov 29, 2019

BPO 38941
Nosy @rhettinger, @scoder, @serhiy-storchaka, @JelleZijlstra, @jacobtylerwalls
PRs
  • gh-83122: Deprecate testing element truth values in ElementTree #31149
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-11-29.13:21:05.659>
    labels = ['expert-XML', 'type-bug', 'library', '3.11']
    title = 'xml.etree.ElementTree.Element inconsistent warning for bool'
    updated_at = <Date 2022-03-23.03:14:02.864>
    user = 'https://bugs.python.org/philiprowlands'

    bugs.python.org fields:

    activity = <Date 2022-03-23.03:14:02.864>
    actor = 'JelleZijlstra'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'XML']
    creation = <Date 2019-11-29.13:21:05.659>
    creator = 'philiprowlands'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38941
    keywords = ['patch']
    message_count = 10.0
    messages = ['357642', '357644', '357645', '357662', '357680', '357682', '357688', '357737', '357740', '415853']
    nosy_count = 7.0
    nosy_names = ['effbot', 'rhettinger', 'scoder', 'serhiy.storchaka', 'JelleZijlstra', 'philiprowlands', 'jacobtylerwalls']
    pr_nums = ['31149']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue38941'
    versions = ['Python 3.11']

    @philiprowlands
    Copy link
    Mannequin Author

    philiprowlands mannequin commented Nov 29, 2019

    Steps to reproduce:

    $ python3.7
    Python 3.7.2 (default, May 13 2019, 13:52:56)
    [GCC 6.3.0 20170516] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import xml.etree.ElementTree as ET
    >>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))
    False
    
    
    $ python3.7
    Python 3.7.2 (default, May 13 2019, 13:52:56)
    [GCC 6.3.0 20170516] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.modules['_elementtree'] = None
    >>> import xml.etree.ElementTree as ET
    >>> bool(ET.fromstring("<a><b><c></c></b></a>").find(".//c"))
    __main__:1: FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.

    This is (almost) the smallest test case, but in real code what I was trying to write was:

    # check the result code is ok
    if response.find("./result[@code='ok']"):
        return True
    

    The unintuitive bool() behaviour was surprising, compared to other typical True / False determinations on objects, and I think what the FutureWarning is trying to avoid.

    Please implement the same warning for bool(Element) in _elementtree.c as exists in ElementTree.py. bpo-29204 was making similar alignments between the versions, but didn't consider this FutureWarning.

    @philiprowlands philiprowlands mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 29, 2019
    @scoder
    Copy link
    Contributor

    scoder commented Nov 29, 2019

    Agreed that both should behave the same.

    And, we should finally decide whether this should really be changed or not. The current behaviour is counter-intuitive, but it's been like that forever and lots of code depends on it in one way or another, so …

    @scoder scoder added the 3.9 only security fixes label Nov 29, 2019
    @philiprowlands
    Copy link
    Mannequin Author

    philiprowlands mannequin commented Nov 29, 2019

    It's easier to justify a change in behaviour if the warning is emitted. With no legacy concerns, I would be happy for bool() to change, but I'm not the one who would receive the grumbly tickets.

    How about emitting the warning in the next release, then assessing the behaviour change for version n+2?

    @rhettinger
    Copy link
    Contributor

    And, we should finally decide whether this should really be
    changed or not. The current behaviour is counter-intuitive,
    but it's been like that forever and lots of code depends on
    it in one way or another, so …

    You make a good case to just leave it be.

    +1 from me to leave as-is.

    @scoder
    Copy link
    Contributor

    scoder commented Dec 1, 2019

    Not so quick. :) You're probably aware of the details, but still, let me state clearly what this is about, to make sure that we're all on the same page here.
    (I'm also asking in Serhiy, because he did quite some work on ET in the past and has experience with it.)

    The Element class in ElementTree represents an XML tag, but it also mimics a sequence container of child elements. Its behaviour is quite list-like, in that it supports indexing and slicing of its children "list", and also has an "append()" method to add one at the end. All of this is a nice and simple API, so what's the catch?

    The gotcha is the behaviour of "__bool__". Following the sequence protocol, the truth value of an Element depends on its children. No children – false. Has children, true. While this makes sense from the sequence protocol side, I don't think it is what users expect. In Python, it is very common to say "if something: …" in order to test if "something" is "a thing". When doing this with Elements, users usually don't think of the element's children but the Element itself, often because they got it back from from kind of tree search, as in the OP's example using ".find()".

    I would argue that this is brittle, because it works when testing with Elements that happen to have children, but it fails when the Element that was found has no children. The intention of the "FutureWarning" was to allow changing the behaviour at some point to always return true as the truth value, i.e. to remove the "__bool__" implementation all together.

    This

    1. fixes the behaviour for broken code that intended to test if it "is a thing"
    2. does not change the behaviour when testing Elements (for whatever reason) that have children
    3. returns true instead of false when truth-testing Elements that have no children, whether the code originally expected this or not.

    The main question now is: what is more common? Code that only accidentally works because it didn't get in touch with child-free (leaf-)elements yet? Or code that would start failing with this change because it really wants to find out if an Element has children by truth-testing it.

    I'm sure that both exist, because I have seen both, and my very limited *feeling* from what I've seen is that there's more brittle code out there than intentional code that truth-tests Elements, and that future users are also more likely to get it wrong until they get bitten by it and learn the actual meaning of the truth-test.

    But, we don't know, and this isn't something to easily find out via a code search. I'm personally more leaning towards changing the behaviour to help new users, but lacking enough evidence, not changing something long-standing is always the safer choice. I'm aware that I will have to take the "final" decision in the end, but would be happy to hear some more intuitions, ideas or even data points first.

    @serhiy-storchaka
    Copy link
    Member

    Stefan's arguments looked reasonable to me. I supported deprecating Element.__bool__. This was an old plan, although we failed with deprecation in the C implementation (this is not the first oversight in deprecations in ElementTree).

    But there is a problem: removing __bool__ will not change the value of bool(element). It will still return False for an Element without children, because bool() falls back to __len__ if __bool__ is not defined, and Element.__len__ is defined. So we will need either to define Element.__bool__ always returning True, and this will be the first case when __bool__ and __len__ return inconsistent values, or to deprecate __len__ (and perhaps __getitem__) that goes too far from the initial plan.

    So now I am not sure. Making Element.__bool__ always returning True or raising an exception could help to fix some bugs in user code. But this differs from common behavior of sequence-like types in Python. On other hand, we have a precedence: bool(NotImplemented) raises an exception now (this also can save from common errors).

    When I started to write this message I was -0 for changing Element.__bool__. Now I am +0, after considering all arguments that came in my mind, but still have doubts. Can we ask some of the language ideologues?

    @gvanrossum
    Copy link
    Member

    I'm +1 on adding the deprecation and changing it to an error in the future.

    I think it's an anti-pattern that objects with complex behavior that includes some sequence behavior allow the fallback behavior for checking the empty sequence. I recall being bitten by this nearly 20 years ago (at Zope) when some database class used this pattern -- somehow the app ended up thinking there was no database at all when in fact there was one but it was empty, and it reinitialized the database with bad results.

    The fallback from truth testing to __len__() is built into the language and cannot be changed, but we *can* override __bool__() to raise an exception, and I think that's what we should do. (Clearly that's also what we were planning to do, hence the warning in the Python version.)

    @philiprowlands
    Copy link
    Mannequin Author

    philiprowlands mannequin commented Dec 3, 2019

    I went digging through the archives, made more interesting as elementtree was imported into the standard library.

    AFAICT, the FutureWarning for __bool__ (or __nonzero__ in py2) appeared circa 2007-06 in version 1.3a2:
    http://svn.effbot.org/public/tags/elementtree-1.3a3-20070912/CHANGES

    • Added future warnings for "if e" (use explicit len(e) or is None
      test) and "e.getchildren()" (use list(e) or iteration).

    The docs are still there, warning about the pitfalls and suggesting "This behaviour is likely to change somewhat in ElementTree 1.3."
    https://effbot.org/zone/element.htm#truth-testing

    12 years on, it has not, but what was the intended change (+effbot for possible context)??

    I assumed on first reading that the plan was to switch bool(Element) to return True, but others have pointed out the inconsistency with len(), or having __bool__() raise an exception.

    My 2c would be to steer devs away from using bool(Element) as a final step, i.e. keep the existing True/False definition, warts and all, but raise a SyntaxWarning (RuntimeWarning?) with a similar message to the current FutureWarning.

    @gvanrossum
    Copy link
    Member

    I doubt effbot will respond. :-)

    I think the first order of business is to add the same FutureWarning to the C version of the code, and to reset the clock on the deprecation.

    In terms of the final stages, why wouldn't we want it to just always raise? Just because there's so much code that might break?

    It shouldn't be a SyntaxWarning, since this is most definitely not a syntax issue. I'd be okay with RuntimeError (or RuntimeWarning if we decide to keep the old outcome with a warning forever).

    (Huh, does FutureWarning count as a deprecation?)

    @JelleZijlstra
    Copy link
    Member

    I was close to merging the linked PR but there's some renewed concerns about the proposed behavior here: Do we really want to change this behavior and potentially force a lot of people to change their working code?

    In addition, the current code emits FutureWarning. I don't really understand what that warning is supposed to be used for (https://docs.python.org/3/library/exceptions.html#FutureWarning), but at least it means we plan to change the behavior later. So we should have a plan for how we'll do that. And whatever we do, the Python and C versions of the code should behave consistently.

    Here are two ways forward:

    (1) We change both the Python and the C versions of ElementTree to emit a DeprecationWarning on using __bool__ in 3.11. In 3.13, we change the behavior to make __bool__ always return True. (Or perhaps, make it raise an exception.)

    (2) We give up on changing this behavior and remove the existing FutureWarning from the Python code. But in that case we should document the gotcha in the boolean behavior of nodes.

    I don't have enough experience with ElementTree to recommend either option, but I'm happy to help implement it once we come to a decision.

    @JelleZijlstra JelleZijlstra added 3.11 only security fixes and removed 3.9 only security fixes labels Mar 23, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @gpshead gpshead added 3.12 bugs and security fixes extension-modules C modules in the Modules dir and removed 3.11 only security fixes labels Jan 11, 2023
    @gpshead
    Copy link
    Member

    gpshead commented Jan 11, 2023

    The FutureWarning existing only on a single (unused by CPython) codepath and not being a DeprecationWarning surprised me. So I'd lean towards @JelleZijlstra 's (1) myself.

    I agree that the purpose of FutureWarning even existing is... unclear. A behavior changing in the future should be called a Deprecation.

    Before deciding on (1) vs (2) though, an experiment with a warning surfaced loudly to see what potential scale of cleanup required across a pile of projects or in a large codebase like we have at work would be useful data.

    @gvanrossum
    Copy link
    Member

    The docs for FutureWarning state that it's for users of applications, whereas DeprecationWarning is for other Python developers. Not sure that that's a very helpful distinction in practice though.

    Note that the PR is stalled: #31149

    @gpshead
    Copy link
    Member

    gpshead commented Jan 12, 2023

    Yeah I don't find FutureWarning super useful, but as documented it effectively means DeprecationWarning is what should've been used. The audience for it is developers, not application end users who should be assumed to know nothing about code and APIs.

    @serhiy-storchaka
    Copy link
    Member

    I find FutureWarning useful, although there are less use cases for it then for DeprecationWarning. The difference between DeprecationWarning and FutureWarning is that the former is used for changes which will cause raising an exception in future, while the latter is used when the future behavior change will be more silent -- just return different result or have different side effects. You can ignore DeprecationWarning for a while, because it is difficult to miss an exception in future versions. You should be more careful with FutureWarning, because a silent change can lead to subtle bugs in future releases. This is why FutureWarning is more visible by default.

    Emitting FutureWarning in __bool__ is appropriate if we want to return a different value in future. Emitting DeprecationWarning in __bool__ is appropriate if we want to raise an exception in future. Raising an exception in __bool__ is unexpected, and currently only NotImplemented.__bool__() emits a DeprecationWarning (for good reasons). So there are three options:

    1. Make __bool__ emitting FutureWarning in both implementations. In future versions make it returning True.
    2. Remove any warnings.
    3. Make __bool__ emitting DeprecationWarning. In future versions make it raising an exception.

    There are variants of 1 and 3 which are less destructive for current code which checks that "it is a thing". If elements in your code always have children, you have not need to change it. If they can be childless, your code have a bug in any case (it will be automatically fixed by 1a, and 3a will force you to fix it).

    1. a. Only emit FutureWarning for childless element.
    2. a. Only emit DeprecationWarning for childless element. In future versions only raise an exception for childless element.

    @jacobtylerwalls
    Copy link
    Contributor

    I've updated #31149 to raise DeprecationWarning.

    I don't think raising an exception is unexpected; the docs have warned since 2.7 that this is an ambiguous pattern. I also think changing the behavior based on whether the element has children is very surprising. In that case, you might never see the DeprecationWarning while testing, and then get bitten in production when encountering childless elements "in the wild".

    gpshead pushed a commit that referenced this issue Jan 23, 2023
    …1149)
    
    When testing element truth values, emit a DeprecationWarning in all implementations.
    
    This had emitted a FutureWarning in the rarely used python-only implementation since ~2.7 and has always been documented as a behavior not to rely on.
    
    Matching an element in a tree search but having it test False can be unexpected. Raising the warning enables making the choice to finally raise an exception for this ambiguous behavior in the future.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.12 bugs and security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    8 participants