Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-informative error message in index() and remove() functions #57558

Closed
akheron opened this issue Nov 5, 2011 · 46 comments
Closed

Non-informative error message in index() and remove() functions #57558

akheron opened this issue Nov 5, 2011 · 46 comments
Labels
3.7 (EOL) end of life extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@akheron
Copy link
Member

akheron commented Nov 5, 2011

BPO 13349
Nosy @brettcannon, @rhettinger, @terryjreedy, @ncoghlan, @vstinner, @ezio-melotti, @merwok, @Julian, @akheron, @serhiy-storchaka, @DimitrisJim, @eamanu
PRs
  • bpo-13349: Fix error reporting for index and remove methods #876
  • bpo-13349: Improve the error message for tuple.index #1309
  • bpo-13349: repr object in tuple.index ValueError message #1820
  • Dependencies
  • bpo-7330: PyUnicode_FromFormat: implement width and precision for %s, %S, %R, %V, %U, %A
  • Files
  • issue-13349.patch
  • issue13349.patch.1: First update to issue patch.
  • issue13349.patch.2
  • issue13349.patch.3
  • issue13349.patch.4
  • issue13349.patch.6
  • issue13349.patch.7
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-05-27.21:57:54.212>
    created_at = <Date 2011-11-05.20:49:37.320>
    labels = ['extension-modules', 'interpreter-core', 'type-bug', '3.7']
    title = 'Non-informative error message in index() and remove() functions'
    updated_at = <Date 2017-05-28.15:33:09.523>
    user = 'https://github.com/akheron'

    bugs.python.org fields:

    activity = <Date 2017-05-28.15:33:09.523>
    actor = 'Jim Fasarakis-Hilliard'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-05-27.21:57:54.212>
    closer = 'rhettinger'
    components = ['Extension Modules', 'Interpreter Core']
    creation = <Date 2011-11-05.20:49:37.320>
    creator = 'petri.lehtinen'
    dependencies = ['7330']
    files = ['27884', '27907', '27940', '27949', '27975', '28265', '28267']
    hgrepos = []
    issue_num = 13349
    keywords = ['patch']
    message_count = 46.0
    messages = ['147109', '147215', '147237', '147238', '147431', '147546', '147795', '147824', '147834', '147836', '147837', '147853', '147859', '174666', '174716', '174744', '174759', '174820', '174821', '174958', '175134', '175265', '175266', '175319', '175325', '175328', '175489', '175491', '177198', '177203', '188598', '289856', '290020', '290492', '290742', '290745', '290747', '290748', '294552', '294563', '294603', '294609', '294613', '294627', '294631', '294649']
    nosy_count = 13.0
    nosy_names = ['brett.cannon', 'rhettinger', 'terry.reedy', 'ncoghlan', 'vstinner', 'ezio.melotti', 'eric.araujo', 'Julian', 'petri.lehtinen', 'serhiy.storchaka', 'Sean.Ochoa', 'Jim Fasarakis-Hilliard', 'eamanu']
    pr_nums = ['876', '1309', '1820']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13349'
    versions = ['Python 3.7']

    @akheron
    Copy link
    Member Author

    akheron commented Nov 5, 2011

    For example:

    >>> (1, 2, 3).index(4)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: tuple.index(x): x not in tuple

    The "x not in tuple" error message should be replaced with "4 is not in tuple". list.index() already does this (see bpo-7252):

    >>> [1, 2, 3].index(4)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: 4 is not in list

    Although in bpo-7252 it was claimed that no other place in stdlib has this error message, I found many occurences:

    Modules/_collectionsmodule.c: PyErr_SetString(PyExc_ValueError, "deque.remove(x): x not in deque"
    Modules/_elementtree.c: "list.remove(x): x not in list"
    Modules/_elementtree.c: "list.remove(x): x not in list"
    Modules/arraymodule.c: PyErr_SetString(PyExc_ValueError, "array.index(x): x not in list");
    Modules/arraymodule.c: PyErr_SetString(PyExc_ValueError, "array.remove(x): x not in list");
    Objects/abstract.c: "sequence.index(x): x not in sequence");
    Objects/listobject.c: PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
    Objects/tupleobject.c: PyErr_SetString(PyExc_ValueError, "tuple.index(x): x not in tuple");

    There's also documentation and tests that depend on this actual error message:

    Doc/library/doctest.rst: ValueError: list.remove(x): x not in list
    Lib/test/test_xml_etree.py: ValueError: list.remove(x): x not in list

    bpo-7252 was fixed in r76058, and it's quite a lot of code. I think it could be done more easily using PyUnicode_FromFormat() and the %R format.

    @akheron akheron added extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 5, 2011
    @akheron
    Copy link
    Member Author

    akheron commented Nov 7, 2011

    The good thing about this is ease of debugging. You can see which is the offending value that was not found.

    On the other hand, the repr of a value might be very long:

    >>> [].index(list(range(1000)))
    ValueError: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...
    (many lines of numbers)
    997, 998, 999] is not in list

    Also, all values don't have a very informal repr:

    >>> class Foo: pass
    ...
    >>> [].index(Foo())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: <__main__.Foo object at 0xb736f92c> is not in list

    @merwok
    Copy link
    Member

    merwok commented Nov 7, 2011

    The good thing about this is ease of debugging.
    Exactly! +1 for the idea.

    On the other hand, the repr of a value might be very long:
    You can restrict the length with % formats.

    Also, all values don't have a very informal repr:
    Not your problem. This change will still be much more useful than the current 'x'. Some reprs are very helpful, other ones give the ID so can be used in debugging, other ones are not helpful at all so authors will have to make them more helpful or debug their code otherwise.

    @merwok
    Copy link
    Member

    merwok commented Nov 7, 2011

    There's also documentation and tests that depend on this actual error message:
    Doc/library/doctest.rst: ValueError: list.remove(x): x not in list
    Lib/test>/test_xml_etree.py: ValueError: list.remove(x): x not in list
    That’s a well-known doctest problem. Just update the doc. Writing robust doctests is an art:

    >>> str(someobject)
    'output that can change'
    >>> 'something I want' in str(someobject)  # more robust, but less useful if it fails
    True
    
    >>> something.index(spam)
    traceback blah:
    ValueError: output that can change
    >>> try: something.index(spam)
    ... except ValueError: print('spam not in something')  # more robust, but ugly
    spam not in something

    @ncoghlan
    Copy link
    Contributor

    doctests by their very nature tend to overspecify things - that's why actual regression tests should be written with unittest, while doctest is kept for its originally intended purpose of testing examples included in docstrings and other documentation.

    Still, there's also a reason why IGNORE_EXCEPTION_DETAIL is an available doctest option.

    @terryjreedy
    Copy link
    Member

    Improving error messages has been a long, slow process as people are irritated enough to make a change. Please go ahead.

    Guido has explicitly excluded exception detail from the language spec multiple times. Test that depend on details are broken. Doc examples are updated as such details change. (So if you do upgrade the message, change the tests and examples ;-).

    @akheron
    Copy link
    Member Author

    akheron commented Nov 17, 2011

    Éric Araujo wrote:

    > On the other hand, the repr of a value might be very long:
    You can restrict the length with % formats.

    Seems you can't with %R. I'd also like to somehow indicate that
    the output is truncated.

    @terryjreedy
    Copy link
    Member

    The test suite has code (functions) that restrict the length on AssertEqual (and other) failure messages. (I do not know the location though.) Perhaps that can be reused. This almost seems like something that should be more easily available.

    @akheron
    Copy link
    Member Author

    akheron commented Nov 18, 2011

    I found safe_repr() from Lib/unittest/util.py. We would require a similar function, just implemented in C.

    What is a good place to define such C helpers that could be used everywhere?

    @ezio-melotti
    Copy link
    Member

    I found safe_repr() from Lib/unittest/util.py.

    The functions in Lib/unittest/util.py shouldn't be used outside unittest.

    We would require a similar function, just implemented in C.
    What is a good place to define such C helpers that could be used everywhere?

    You could start by adding it in the file where you need it. If it starts becoming useful elsewhere too, it can then be moved somewhere else. I would expect such function to be private, so we are free to move it whenever we need it.

    @akheron
    Copy link
    Member Author

    akheron commented Nov 18, 2011

    Ezio Melotti wrote:

    You could start by adding it in the file where you need it. If it
    starts becoming useful elsewhere too, it can then be moved somewhere
    else. I would expect such function to be private, so we are free to
    move it whenever we need it.

    Well, msg147109 already lists 6 files that would use this function.
    Some of them are under Objects&, some are under Modules/.

    @ncoghlan
    Copy link
    Contributor

    Please don't stress too much about providing an indication that the repr has been truncated - it's an error message, not part of the normal program output. Besides, the lack of a closing ')', ']', '}' or '>' will usually indicate something is amiss in long reprs.

    More useful would be to raise a separate feature request about the lack of width and precision handling for string formatting in PyUnicode_FromFormatV - the common format code handling means it *accepts* the width and precision information for string codes, but then proceeds to completely ignore it. It should be using them to pad short strings (width) and truncate long ones (precision), just like printf() (only in terms of code points rather than bytes, since those codes work with Unicode text).

    @akheron
    Copy link
    Member Author

    akheron commented Nov 18, 2011

    Nick Coghlan wrote:

    Please don't stress too much about providing an indication that the
    repr has been truncated - it's an error message, not part of the
    normal program output. Besides, the lack of a closing ')', ']', '}'
    or '>' will usually indicate something is amiss in long reprs.

    Ok. This makes things easier.

    More useful would be to raise a separate feature request about the
    lack of width and precision handling for string formatting in
    PyUnicode_FromFormatV - the common format code handling means it
    *accepts* the width and precision information for string codes, but
    then proceeds to completely ignore it. It should be using them to
    pad short strings (width) and truncate long ones (precision), just
    like printf() (only in terms of code points rather than bytes, since
    those codes work with Unicode text).

    Created bpo-13428.

    @merwok merwok changed the title Uninformal error message in index() and remove() functions Non-informative error message in index() and remove() functions Nov 3, 2012
    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 3, 2012

    Working on issue as part of Python Bug Day, Oct 2012.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 3, 2012

    It seems that this has been fixed. Using simple tests from msg147215:

    ~/hg/cpython/working $ env-py3.3/bin/python
    Python 3.3.0 (default, Nov  3 2012, 15:28:29) 
    [GCC 4.7.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> [].index(list(range(1000)))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: [0, 1, 2, 3, 4, 5, ... 995, 996, 997, 998, 999] is not in list
    [60649 refs]
    >>> class Foo: pass
    ... 
    [60679 refs]
    >>> [].index(Foo())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: <__main__.Foo object at 0x7f7fad31bc30> is not in list
    [60677 refs]

    @terryjreedy
    Copy link
    Member

    The original example was about tuples and displaying 'x' instead of '4'. Have that and *all* the others been fixed?

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 4, 2012

    After discussing with folks on the #python-dev tonight, I learned that I was testing with a list and I should've been using a set. I'm working on a patch now, and I'm almost ready to have it reviewed.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 4, 2012

    Ready for review.

    @ezio-melotti
    Copy link
    Member

    You should use assertRaises/assertRaisesRegex to check the error message and add more tests to cover all the changes. I also noticed an inconsistence between 'element not in container' and 'element is not in container' (note the extra 'is'). FWIW I prefer the version without 'is'.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 6, 2012

    Tests updated for coverage and to use assertRaisesRegex.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 8, 2012

    From Taggnostr on #python-dev:

    1.) Use assertRaises+assertIn instead of assertRaisesRegex
    2.) Add or change an existing test that you've already updated to use elements with a repr longer than 100 chars.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 10, 2012

    Truncating repr strings to 100chars will require the patch from bpo-7330. After applying the patch from that issue, my tests work fine.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 10, 2012

    Updated patch after taking into account Ezio's (aka Taggnostr on #python-dev) latest feedback.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 11, 2012

    Updates after feedback from Serhiy.

    @Julian
    Copy link
    Mannequin

    Julian mannequin commented Nov 11, 2012

    test_issuewhatever is a bit indescriptive.

    It'd be easier for someone glancing at the test to see what it claims to be testing if there were a more descriptive name given, like perhaps test_exception_message (or something even more verbose).

    @serhiy-storchaka
    Copy link
    Member

    Or test_index()/test_remove(). In this case positive test cases (and may be other exception types) should be added.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Nov 13, 2012

    Lib/test/test_array.py
    -- Moved index test (for this issue) to test_index (existing test method).
    -- Added remove test (for this issue) to test_remove (existing test method)

    Lib/test/test_deque.py
    -- Moved remove test (for this issue) to test_index (existing test method).

    Lib/test/test_list.py
    -- Added test_remove instead of test_issue13349, and test_index as a new test method; both with positive tests and a negative test to for this issue.

    Lib/test/test_tuple.py
    -- Added test_remove instead of test_issue13349 as a new test method with a positive test and a negative test case for this issue to cover both items longer than 100chars and shorter items.

    Lib/test/test_xml_etree.py
    -- Added test_remove as a new test method to TreeBasicOperationTest testcase class instead of test_issue13349, with a positive test case and the negative test to cover this issue.

    @serhiy-storchaka
    Copy link
    Member

    See also my comments to previous patch about repr() and error message checking.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Dec 9, 2012

    Update based on Taggnostr's (Ezio on IRC, if I recall correctly) feedback from 11/12/2012 around 11:57 PST.

    • Added check for index result in positive tests.
    • Added assertIn check for remove result in positive tests.
    • Removed extra whitespace.
    • Formatted comments to be more concise.

    @SeanOchoa
    Copy link
    Mannequin

    SeanOchoa mannequin commented Dec 9, 2012

    Confirmed with Ezio that issue bpo-7330 will need to be fixed/approved before this issue can be closed.

    @vstinner
    Copy link
    Member

    vstinner commented May 6, 2013

    Confirmed with Ezio that issue bpo-7330 will need to be fixed/approved before this issue can be closed.

    FYI I just fixed this issue. PyUnicode_FromFormat() now supports width and precision in the format string.

    @DimitrisJim
    Copy link
    Mannequin

    DimitrisJim mannequin commented Mar 19, 2017

    @sean Ochoa, do you want to make this into a PR? The only tweak I would suggest would be to change all error messages to either be:

    "object.method(repr(x)): element not in object" 
    

    or:

    "repr(x) not in object"
    

    also, this probably needs to be changed to version 3.7 now.

    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Mar 23, 2017

    I agree with Jim Fasarakis-Hilliard this message may be change in new 3.7 version

    @DimitrisJim
    Copy link
    Mannequin

    DimitrisJim mannequin commented Mar 25, 2017

    Additional instances of this:

    • function indexOf of operator.py.
    • function _PySequence_IterSearch of abstract.c

    @serhiy-storchaka
    Copy link
    Member

    >>> [1, 2, 3].index("'")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: ""'"" not in list

    Aren't too much quotes?

    @DimitrisJim
    Copy link
    Mannequin

    DimitrisJim mannequin commented Mar 28, 2017

    Yes, that does look like too much. My rationale for adding quotes around the value was in order to make it more clear in cases where the repr exceeds 100 characters.

    Instead of:

       Traceback (most recent call last):
         File "<stdin>", line 1, in <module>
       ValueError: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2 not in list

    Have it clearly distinguished by using "":

       Traceback (most recent call last):
         File "<stdin>", line 1, in <module>
       ValueError: "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2" not in list

    I'm not sure if there's a trivial way to not display so many quotes in the case you supplied, you're better suited to decide if this can be done somehow.

    @DimitrisJim DimitrisJim mannequin added the 3.7 (EOL) end of life label Mar 28, 2017
    @DimitrisJim
    Copy link
    Mannequin

    DimitrisJim mannequin commented Mar 28, 2017

    Could use %.100S instead of %.100R but I'm not sure of the downsides that might entail.

    @serhiy-storchaka
    Copy link
    Member

    Error messages should use repr(). For example str() of bytes object emits a warning or error.

    See bpo-26090 for the issue with truncated reprs.

    @rhettinger
    Copy link
    Contributor

    See http://bugs.python.org/issue30477 for why I think this shouldn't be done. In short, the IndexError may be part of control flow (looping over successive matches) rather than an error (this patch adds a time/space cost with no benefit in those cases). The repr itself may be large resulting in an unusable error message. The repr may be expensive to compute for some objects.

    @brettcannon
    Copy link
    Member

    A potential compromise would be if ValueError gained a 'value' attribute. That would allow for the exception message to be static but still provide the information for introspection later if desired to figure out exactly what object led to the cause of the ValueError. If I remember correctly previous objections to this idea was worry about memory, though, so it seems attaching more information about the trigger of a ValueError will inevitably lead to more cost somehow.

    @rhettinger
    Copy link
    Contributor

    Thinking back over my 17 years of using Python, I really don't think this is needed at all. I can't think of a single situation where it would have helped.

    In some cases, the change would be detrimental to readability, introducing clutter into an otherwise clear error message. The OP's example of "ValueError: 4 is not in list" is atypical. A more typical example would be "ValueError: <main.TensorElementPlaceholder object at 0x106489e48> is not in list".

    As a Python instructor, I've become acutely aware that tracebacks already have usability issues due to visual clutter and an unpleasant overall appearance. IMO, this proposal would sometimes make it worse by adding more noise for the reader to filter out.

    Brett's idea is an improvement to the original proposal. By attaching the object to the exception rather than squeezing its representation into the string, you avoid making the error message hard to read and avoid the speed/space costs from computing the __repr__ on arbitrary objects. That said, I don't think there is any programmatic benefit to having the object in the exception. When you call "a.remove(b)", you already have "b" and it would be pointless to re-extract it with "except ValueError as e: b = e.args[-1]".

    I'm also concerned that the proposed changes sweep across the core affecting code that has been deployed and stable for almost two decades.

    Given the dubious benefits and potential detriments, I vote for declining this proposal and favoring the stable deployed code that has worked fine for many years.

    @brettcannon
    Copy link
    Member

    I'm fine with not changing this, but I did want to say that I don't think you're guaranteed to have access to the object that triggered the ValueError. If the value that is searched for is in some extension module where it's calculated using C code then you have no way of accessing that value unless the extension module was nice enough to provide the object in the exception, give the repr in the exception message, or document how exactly the transformation from input to what you searched for is.

    @rhettinger
    Copy link
    Contributor

    I'm fine with not changing this

    Thanks Brett. I'm going to mark this as closed/rejected. After five years, it is time to put it to rest and let it stop consuming our mental clock cycles.

    I don't think you're guaranteed to have access to the object
    that triggered the ValueError.

    I'm not sure I follow what you're saying. To call index() or remove() you already have to have the object to search for. The ValueError is raised when the searched for object is not found, not by some unknown object inside the container.

    @ncoghlan
    Copy link
    Contributor

    While I agree it's reasonable to keep arbitrary value repr's out of these messages by default, I do think it would be desirable for someone sufficiently motivated to file a separate RFE about adding the value as a structured exception attribute so that custom sys.excepthook implementations and other "unhandled runtime exception" loggers may choose to handle the situation differently.

    This wouldn't be about the cases where code is handling the remove() or index() call directly, but rather about those where third party library code isn't handling missing values correctly, so the debugging scenario a developer is faced with looks more like this than it does the examples in the OP here:

    >>> some_library_function()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in some_library_function
      File "<stdin>", line 2, in some_other_library_function
    ValueError: list.remove(x): x not in list
    

    Getting that kind of traceback in the logs for a production failure in a distributed system is all kinds of frustrating since you *know* the interpreter knew what "x" was when the error happened, it just isn't passing that information along to the developer that's trying to figure out how and why it broke.

    @serhiy-storchaka
    Copy link
    Member

    I concur with Raymond.

    But shouldn't we change error messages that contains the repr of not found value? Affected collections are list, deque and range.

    >>> [].index('spam')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: 'spam' is not in list
    >>> import collections
    >>> collections.deque().index('spam')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: 'spam' is not in deque

    range.index() raises different error messages depending on the type of the value:

    >>> range(10).index('spam')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: sequence.index(x): x not in sequence
    >>> range(10).index(42)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: 42 is not in range

    @DimitrisJim
    Copy link
    Mannequin

    DimitrisJim mannequin commented May 28, 2017

    Thanks, I understand why this isn't the best idea now.

    shouldn't we change error messages that contains the repr of not found value?

    That is what I was thinking too, apart from removing the repr from other messages it will also unify reporting for all these methods.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants