Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer suggestions on AttributeError and NameError #82711

Closed
pablogsal opened this issue Oct 19, 2019 · 40 comments
Closed

Offer suggestions on AttributeError and NameError #82711

pablogsal opened this issue Oct 19, 2019 · 40 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@pablogsal
Copy link
Member

BPO 38530
Nosy @vstinner, @aroberge, @serhiy-storchaka, @1st1, @pablogsal, @tirkarthi, @isidentical, @sweeneyde
PRs
  • bpo-38530: Offer suggestions on AttributeError #16850
  • bpo-38530: Offer suggestions on AttributeError #16856
  • bpo-38530: Offer suggestions on NameError #25397
  • bpo-38530: Clean exceptions if dir() fails when making suggestions for AttributeError #25408
  • bpo-38530: Optimize the calculation of string sizes when offering suggestions #25412
  • bpo-38530: Match exactly AttributeError and NameError when offering suggestions #25443
  • bpo-38530: Properly extend UnboundLocalError from NameError #25444
  • bpo-38530: Include builtins in NameError suggestions #25460
  • bpo-38530: Cover more error paths in error suggestion functions #25462
  • bpo-38530: Surround suggestions by quotes #25473
  • bpo-38530: Require 50% similarity in NameError suggestions #25584
  • bpo-38530: Refactor AttributeError suggestions #25776
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-04-14.23:32:49.056>
    created_at = <Date 2019-10-19.18:57:31.623>
    labels = ['interpreter-core', 'type-feature', '3.9']
    title = 'Offer suggestions on AttributeError and NameError'
    updated_at = <Date 2021-05-03.15:47:35.739>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2021-05-03.15:47:35.739>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-04-14.23:32:49.056>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2019-10-19.18:57:31.623>
    creator = 'pablogsal'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38530
    keywords = ['patch']
    message_count = 40.0
    messages = ['354954', '354955', '354956', '354957', '354958', '354959', '354960', '354964', '354965', '354967', '354969', '354970', '354971', '354972', '354973', '354975', '354979', '354981', '355499', '355503', '355527', '355530', '355531', '355533', '359977', '391026', '391080', '391090', '391108', '391123', '391219', '391224', '391312', '391316', '391834', '392007', '392513', '392543', '392581', '392817']
    nosy_count = 9.0
    nosy_names = ['vstinner', 'aroberge', 'serhiy.storchaka', 'yselivanov', 'james', 'pablogsal', 'xtreak', 'BTaskaya', 'Dennis Sweeney']
    pr_nums = ['16850', '16856', '25397', '25408', '25412', '25443', '25444', '25460', '25462', '25473', '25584', '25776']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue38530'
    versions = ['Python 3.9']

    @pablogsal
    Copy link
    Member Author

    To improve the debugging experience in both interactive and non-interactive code, I propose to offer suggestions when attribute access fails. For example:

    >>> class A: foo = None
    ... 
    >>> A.fou
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: type object 'A' has no attribute 'fou'

    Did you mean: foo?

    This also applies to imports from modules and other situations:

    >>> import collections
    >>> collections.NamedTuple
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: module 'collections' has no attribute 'NamedTuple'

    Did you mean: namedtuple?

    @pablogsal pablogsal added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Oct 19, 2019
    @pablogsal
    Copy link
    Member Author

    PR 16850 shows an initial prototype for the idea

    @isidentical
    Copy link
    Sponsor Member

    It already exists as a 3rd party module and it would be really cool to have this in core level.

    https://github.com/dutc/didyoumean (by James Powell)

    @pablogsal
    Copy link
    Member Author

    I am not super convinced that this is a great idea because it has some performance cost (although somewhat controlled) but I want to open a discussion.

    @tirkarthi
    Copy link
    Member

    Ruby has it integrated into the core : https://bugs.ruby-lang.org/issues/11252 . It was initially a gem that got merged into core.

    methosd
    undefined local variable or method methosd' for main:Object Did you mean? methods method (repl):1:in

    '

    @pablogsal
    Copy link
    Member Author

    Idea: we could only do this in interactive mode if we consider that is expensive enough.

    @pablogsal
    Copy link
    Member Author

    I am running pyperformance to check the performance cost of this.

    @pablogsal
    Copy link
    Member Author

    Slower (27):

    • pathlib: 25.2 ms +- 0.4 ms -> 105 ms +- 2 ms: 4.18x slower (+318%)
    • sympy_str: 315 ms +- 3 ms -> 500 ms +- 3 ms: 1.59x slower (+59%)
    • sympy_sum: 203 ms +- 2 ms -> 286 ms +- 2 ms: 1.41x slower (+41%)
    • sqlalchemy_imperative: 36.5 ms +- 0.9 ms -> 41.8 ms +- 0.8 ms: 1.14x slower (+14%)
    • python_startup: 12.4 ms +- 0.1 ms -> 13.9 ms +- 0.0 ms: 1.12x slower (+12%)
    • python_startup_no_site: 9.19 ms +- 0.04 ms -> 10.3 ms +- 0.0 ms: 1.12x slower (+12%)
    • sympy_integrate: 24.9 ms +- 0.1 ms -> 27.6 ms +- 0.3 ms: 1.11x slower (+11%)
    • scimark_sparse_mat_mult: 5.21 ms +- 0.05 ms -> 5.49 ms +- 0.06 ms: 1.05x slower (+5%)
    • unpickle_list: 5.78 us +- 0.07 us -> 6.08 us +- 0.07 us: 1.05x slower (+5%)
    • 2to3: 392 ms +- 2 ms -> 411 ms +- 10 ms: 1.05x slower (+5%)
    • nbody: 155 ms +- 2 ms -> 160 ms +- 2 ms: 1.03x slower (+3%)
    • scimark_fft: 423 ms +- 2 ms -> 432 ms +- 3 ms: 1.02x slower (+2%)
    • float: 142 ms +- 2 ms -> 145 ms +- 2 ms: 1.02x slower (+2%)
    • unpack_sequence: 68.6 ns +- 2.5 ns -> 69.9 ns +- 2.1 ns: 1.02x slower (+2%)
    • pickle_list: 5.65 us +- 0.05 us -> 5.76 us +- 0.07 us: 1.02x slower (+2%)
    • fannkuch: 569 ms +- 3 ms -> 579 ms +- 6 ms: 1.02x slower (+2%)
    • xml_etree_parse: 196 ms +- 2 ms -> 199 ms +- 2 ms: 1.02x slower (+2%)
    • sqlalchemy_declarative: 210 ms +- 2 ms -> 213 ms +- 3 ms: 1.01x slower (+1%)
    • unpickle: 16.9 us +- 0.1 us -> 17.2 us +- 0.7 us: 1.01x slower (+1%)
    • regex_effbot: 3.93 ms +- 0.12 ms -> 3.97 ms +- 0.04 ms: 1.01x slower (+1%)
    • scimark_monte_carlo: 128 ms +- 2 ms -> 129 ms +- 2 ms: 1.01x slower (+1%)
    • scimark_lu: 186 ms +- 5 ms -> 188 ms +- 6 ms: 1.01x slower (+1%)
    • regex_dna: 272 ms +- 1 ms -> 274 ms +- 1 ms: 1.01x slower (+1%)
    • xml_etree_iterparse: 130 ms +- 2 ms -> 131 ms +- 3 ms: 1.01x slower (+1%)
    • genshi_xml: 74.3 ms +- 0.8 ms -> 74.8 ms +- 0.8 ms: 1.01x slower (+1%)
    • regex_v8: 29.7 ms +- 0.3 ms -> 29.9 ms +- 0.2 ms: 1.01x slower (+1%)
    • mako: 19.3 ms +- 0.3 ms -> 19.4 ms +- 0.2 ms: 1.01x slower (+1%)

    The current approach is too expensive, so I'm closing PR 16850.

    @serhiy-storchaka
    Copy link
    Member

    AFAIK there is existing issue for this idea.

    I have doubts about performance. I added _PyObject_LookupAttr in particularly to avoid an overhead of raising and silencing an AttributeError. I believe most performance sensitive code in the core now uses it and will not be affected, but there is other code which silences it (_PyObject_LookupAttr itself silences an AttributeError raised in called functions), third-party code which uses PyObject_HasAttr or PyObject_GetAttr can be affected.

    It might be simpler to implement it in sys.excepthook or sys.displayhook, but at that point we do not have attribute name and a reference to the object. There is an issues and a PEP about adding references to AttributeError. It could help to implement this feature.

    @serhiy-storchaka
    Copy link
    Member

    I am surprised that it was SO expensive.

    Pathlib would largely benefit from cached_property if it be compatible with slots.

    @pablogsal
    Copy link
    Member Author

    Serhiy, do you think we could attach the object and the name to some private fields of the AttributeError and check that in sys.excepthook if they are present?

    @pablogsal
    Copy link
    Member Author

    I will also repeat the pyperformance results locally just in case something was off on the speed.python.org server.

    @serhiy-storchaka
    Copy link
    Member

    Why private? They should be public.

    But the problem is that making a reference to the object we can prolong its lifetime and even create a reference loop. There was a proposition to create a weak reference, but not all types support weak reference.

    @pablogsal
    Copy link
    Member Author

    Why private? They should be public.

    I was suggesting orivate for now until the/a PEP to modify the exception is approved. In this way we could try to implement the feature that way.

    ------

    On the other hand do you see any way to make the current approach not that slow? Maybe activating it only on interactive mode?

    @serhiy-storchaka
    Copy link
    Member

    See bpo-18156, bpo-22716 and PEP-473.

    @pablogsal
    Copy link
    Member Author

    But the problem is that making a reference to the object we can prolong its lifetime and even create a reference loop.

    If I'm not mistaken, as long as the traceback is alive, the object is alive beacuse the frames will contain it. The other case is if the exception is not propagated, but in that case it should just die unless explicitly captured.

    The cycle only happens if the object has a reference to the exception, and that should not happen in the general case.

    @pablogsal
    Copy link
    Member Author

    I have opened PR 16856 adding fields to the AttributeError and implementing the feature in PyErr_Display.

    @pablogsal
    Copy link
    Member Author

    With the new approach, there is no measurable different in performance:

    venv ❯ pyperf compare_to json/2019-10-19_20-01-master-24dc2f8c5669.json.gz json/2019-10-20_01-32-suggestions2-21404456383b.json.gz -G

    Slower (3):

    • 2to3: 400 ms +- 2 ms -> 405 ms +- 3 ms: 1.01x slower (+1%)
    • sqlite_synth: 3.46 us +- 0.08 us -> 3.50 us +- 0.08 us: 1.01x slower (+1%)
    • nqueens: 121 ms +- 1 ms -> 121 ms +- 1 ms: 1.01x slower (+1%)

    Faster(4):

    • hexiom: 11.9 ms +- 0.1 ms -> 11.8 ms +- 0.1 ms: 1.01x faster (-1%)
    • xml_etree_generate: 113 ms +- 2 ms -> 113 ms +- 2 ms: 1.01x faster (-1%)
    • python_startup_no_site: 9.25 ms +- 0.06 ms -> 9.20 ms +- 0.01 ms: 1.01x faster (-1%)
    • go: 317 ms +- 4 ms -> 315 ms +- 3 ms: 1.01x faster (-1%)
    • sympy_integrate: 25.1 ms +- 0.2 ms -> 25.0 ms +- 0.1 ms: 1.01x faster (-1%)

    Benchmark hidden because not significant (57): chameleon, chaos, crypto_pyaes, deltablue, dulwich_log, fannkuch, float, genshi_text, genshi_xml, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, meteor_contest, nbody, pathlib, pickle, pickle_dict, pickle_list, pickle_pure_python, pidigits, python_startup, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_declarative, sqlalchemy_imperative, sympy_expand, sympy_sum, sympy_str, telco, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_process

    @pablogsal
    Copy link
    Member Author

    I think I am going to proceed to modify PR 16856 by adding the name and the object to the AttributeError exceptions.

    This should not extend the lifetime of the object more than the current exception is doing as the exception keeps alive the whole frame stack in the __traceback__ attribute. Consider this code for example:

    class Target:
        def __del__(self):
            print("The object is dead!")
    
    def f():
        g()
    
    def g():
        h()
    
    def h():
        theobj = Target()
        theobj.thevalue

    try:
    f()
    except Exception as e:
    print(e.__traceback__.tb_next.tb_next.tb_next.tb_frame.f_locals["theobj"])
    print("End of except")
    print("Begining of the next line")

    This code prints:

    <main.Target object at 0x7f064adbfe10>
    End of except
    The object is dead!
    Beginning of the next line

    We can notice two things:

    • The target objects are reachable from the exception.
    • When the exception dies, the target object dies.

    Adding another reference to the target object to the exception won't change the current lifetime, neither will create reference cycles as the target object will not have (unless explicitly created) a reference to the exception. We can conclude that this change should be safe.

    In the resolution email for PEP-473 the council stated that "Discussions about adding more attributes to built-in exceptions can continue on the issue tracker on a per-exception basis". As I think this will be very beneficial for the feature discussed in the issue, I will proceed as indicated.

    Notice that if we want to change all internal CPython code that raises AttributeError to use the new fields, this should be done in a different PR to keep this minimal. For this feature, we still need to intercept AttributeError raised by the user without these fields added.

    @serhiy-storchaka
    Copy link
    Member

    Interesting, why locals are not cleared when an exception leaves a frame?

    @vstinner
    Copy link
    Member

    Helping the developer to suggest a fix introduces a minor but non-zero overhead, I would prefer to only enable it as an opt-in option. Maybe enable it using in the development mode (-X dev/PYTHONDEVMODE=1)?

    https://github.com/dutc/didyoumean (by James Powell)

    This project hooks into PyObject_GetAttr() by modifying PyObject_GetAttr() machine code, which is definitely not a portable approach.

    Maybe one approach would be to add a way to install a hook to customize AttributeError exceptions?

    Can this issue be implemented using sys.excepthook?

    --

    This issue is specific to AttributeError, but I vaguely recall that Yury Selivanov told me that he wanted to something but for any exception. Detect the most common mistakes and propose a solution. I don't think that he ever sent anything in public sadly.

    I add Yury in the nosy list.

    --

    Here is another project which also catch NameError, AttributeError, ImportError, TypeError, ValueError, SyntaxError, MemoryError, OverflowError, OSError, RuntimeError, etc. :
    https://github.com/SylvainDe/DidYouMean-Python

    It is implemented with sys.excepthook, but it is also compatible with IPython "custom exception handler" (call get_ipython().set_custom_exc()).

    By the way, does IPython have a feature like this?

    In short, https://github.com/SylvainDe/DidYouMean-Python seems to already implement this issue in the proper way, no?

    --

    Similar project for Ruby:
    https://github.com/yuki24/did_you_mean

    @pablogsal
    Copy link
    Member Author

    By the way, does IPython have a feature like this?

    Not that I know of.

    In short, https://github.com/SylvainDe/DidYouMean-Python seems to already implement this issue in the proper way, no?

    I briefly checked the project. My current approach exposes the object and the name directly on the exception, so the display hook can directly using them instead of fiddling with frames and other things.

    Additionally, I think it would be very beneficial to have it in the core, as many people learning python for example cannot install packages.

    @pablogsal
    Copy link
    Member Author

    Helping the developer to suggest a fix introduces a minor but non-zero overhead, I would prefer to only enable it as an opt-in option. Maybe enable it using in the development mode (-X dev/PYTHONDEVMODE=1)?

    I think doing that would make it lose almost completely its value.

    @vstinner
    Copy link
    Member

    Ruby has it integrated into the core : https://bugs.ruby-lang.org/issues/11252 . It was initially a gem that got merged into core.

    GCC also provides more and more hints. Example:

       int main() { int hello = 1; return helo; }

    GCC:

    error: 'helo' undeclared (first use in this function); did you mean 'hello'?

    @vstinner
    Copy link
    Member

    Related issue: PEP-534 -- Improved Errors for Missing Standard Library Modules
    https://www.python.org/dev/peps/pep-0534/

    @pablogsal
    Copy link
    Member Author

    New changeset 37494b4 by Pablo Galindo in branch 'master':
    bpo-38530: Offer suggestions on AttributeError (bpo-16856)
    37494b4

    @pablogsal pablogsal changed the title Offer suggestions on AttributeError Offer suggestions on AttributeError and NameError Apr 14, 2021
    @pablogsal
    Copy link
    Member Author

    New changeset 5bf8bf2 by Pablo Galindo in branch 'master':
    bpo-38530: Offer suggestions on NameError (GH-25397)
    5bf8bf2

    @pablogsal
    Copy link
    Member Author

    New changeset e07f4ab by Pablo Galindo in branch 'master':
    bpo-38530: Make sure that failing to generate suggestions on failure will not propagate exceptions (GH-25408)
    e07f4ab

    @pablogsal
    Copy link
    Member Author

    New changeset 3fc65b9 by Pablo Galindo in branch 'master':
    bpo-38530: Optimize the calculation of string sizes when offering suggestions (GH-25412)
    3fc65b9

    @vstinner
    Copy link
    Member

    I like the https://docs.python.org/dev/whatsnew/3.10.html#better-error-messages section: well done, thanks ;-)

    @pablogsal
    Copy link
    Member Author

    New changeset 3b82cae by Pablo Galindo in branch 'master':
    bpo-38530: Properly extend UnboundLocalError from NameError (GH-25444)
    3b82cae

    @pablogsal
    Copy link
    Member Author

    New changeset 0ad81d4 by Pablo Galindo in branch 'master':
    bpo-38530: Match exactly AttributeError and NameError when offering suggestions (GH-25443)
    0ad81d4

    @pablogsal
    Copy link
    Member Author

    New changeset 3ab4bea by Pablo Galindo in branch 'master':
    bpo-38530: Include builtins in NameError suggestions (GH-25460)
    3ab4bea

    @pablogsal
    Copy link
    Member Author

    New changeset 0b1c169 by Pablo Galindo in branch 'master':
    bpo-38530: Cover more error paths in error suggestion functions (GH-25462)
    0b1c169

    @sweeneyde
    Copy link
    Member

    I opened PR 25584 to fix this current behavior:

    >>> v
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'v' is not defined. Did you mean: 'id'?
    >>> vv
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'vv' is not defined. Did you mean: 'id'?
    >>> vvv
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'vvv' is not defined. Did you mean: 'abs'?

    @pablogsal
    Copy link
    Member Author

    New changeset 284c52d by Dennis Sweeney in branch 'master':
    bpo-38530: Require 50% similarity in NameError and AttributeError suggestions (GH-25584)
    284c52d

    @sweeneyde
    Copy link
    Member

    Some research of other projects:

    LLVM [1][2]
    -----------

    • Compute Levenshtein
      • Using O(n) memory rather than O(n^2)
    • Uses UpperBound = (len(typo) + 2) // 3

    GCC [3]
    -------

    • Uses Damerau-Levenshtein distance
      • Counts transpositions like "abcd" <-> "bacd" as one move
    • Swapping Case as in "a" <-> "A" counts as half a move
    • cutoff = (longer + 2) // 3 if longer - shorter >= 2 else max(longer // 3, 1)

    Rust [4]
    --------

    • "maximum allowable edit distance defaults to one-third of the given word."
    • First checks for exact case-insensitive match, then check for Levenshtein distance small enough, then check if sorted(a.split("_")) == sorted(b.split("_"))

    Ruby [5]
    --------

    • Quickly filter out words with bad Jaro–Winkler distance
      • threshold = input.length > 3 ? 0.834 : 0.77
    • Only compute Levenshtein for words that remain
      • threshold = (input.length * 0.25).ceil
      • Output all good enough words
    • If no word was good enough then output the closest match.

    I think there are some good ideas here.

    [1] https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/include/llvm/ADT/edit_distance.h#L42
    [2] https://github.com/llvm/llvm-project/blob/e2b3b89bf1ce74bf889897e0353a3e3fa93e4452/clang/lib/Sema/SemaLookup.cpp#L4263
    [3] https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/spellcheck.c
    [4] https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/compiler/rustc_span/src/lev_distance.rs#L44
    [5] https://github.com/ruby/ruby/blob/48b94b791997881929c739c64f95ac30f3fd0bb9/lib/did_you_mean/spell_checker.rb

    @pablogsal
    Copy link
    Member Author

    Hi Dennis, this is a fantastic investigation!

    I think I really like GCC approach here. We may want to invest into porting some of their ideas into our solution.

    @sweeneyde
    Copy link
    Member

    PR 25776 is a work in progress for what it might look like to do a few things:

    • Make case-swaps half the cost of any other edit
    • Refactor Levenshtein code to not use memory allocator, and to bail early on no match.
    • Add comments to Levenshtein distance code
    • Add test cases for Levenshtein distance behind a debug macro
    • Set threshold to (name_size + item_size + 3) * MOVE_COST / 6.
      • Reasoning: similar to difflib.SequenceMatcher.ratio() >= 2/3:
        "Multiset Jaccard similarity" >= 2/3
        matching letters / total letters >= 2/3
        (name_size - distance + item_size - distance) / (name_size + item_size) >= 2/3
        1 - (2*distance) / (name_size + item_size) >= 2/3
        1/3 >= (2*distance) / (name_size + item_size)
        (name_size + item_size) / 6 >= distance
        With rounding:
        (name_size + item_size + 3) // 6 >= distance

    Re: Damerau-Levenshtein (transpositions as single edits), if that were to get implemented, I don't see a way to do that without using a buffer of at least 3x the size, storing the most recent 3 rows of the matrix.

    @pablogsal
    Copy link
    Member Author

    New changeset 80a2a4e by Dennis Sweeney in branch 'master':
    bpo-38530: Refactor and improve AttributeError suggestions (GH-25776)
    80a2a4e

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants