Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDLE: Fix pyparse.find_good_parse_start #77170

Closed
csabella opened this issue Mar 3, 2018 · 16 comments
Closed

IDLE: Fix pyparse.find_good_parse_start #77170

csabella opened this issue Mar 3, 2018 · 16 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-IDLE type-feature A feature request or enhancement

Comments

@csabella
Copy link
Contributor

csabella commented Mar 3, 2018

BPO 32989
Nosy @tim-one, @terryjreedy, @csabella, @miss-islington
PRs
  • bpo-32989: IDLE - fix bad editor call of pyparse method #5968
  • [3.8] bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968) #18096
  • [3.7] bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968) #18097
  • bpo-32989: IDLE - remove unneeded parameter  #18138
  • [3.8] bpo-32989: IDLE - remove unneeded parameter (GH-18138) #18139
  • [3.7] bpo-32989: IDLE - remove unneeded parameter (GH-18138) #18140
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2020-01-23.07:11:51.918>
    created_at = <Date 2018-03-03.00:44:36.519>
    labels = ['3.8', 'expert-IDLE', 'type-feature', '3.7', '3.9']
    title = 'IDLE: Fix pyparse.find_good_parse_start'
    updated_at = <Date 2020-01-23.07:11:51.917>
    user = 'https://github.com/csabella'

    bugs.python.org fields:

    activity = <Date 2020-01-23.07:11:51.917>
    actor = 'terry.reedy'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2020-01-23.07:11:51.918>
    closer = 'terry.reedy'
    components = ['IDLE']
    creation = <Date 2018-03-03.00:44:36.519>
    creator = 'cheryl.sabella'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32989
    keywords = ['patch']
    message_count = 16.0
    messages = ['313170', '313178', '313179', '313180', '313195', '313205', '313231', '360361', '360380', '360384', '360385', '360539', '360540', '360542', '360543', '360544']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'terry.reedy', 'cheryl.sabella', 'miss-islington']
    pr_nums = ['5968', '18096', '18097', '18138', '18139', '18140']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue32989'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @csabella
    Copy link
    Contributor Author

    csabella commented Mar 3, 2018

    From msg312726 on bpo-32880.

    The call to find_good_parse_start:

        bod = y.find_good_parse_start(self.context_use_ps1, 
                             self._build_char_in_string_func(startatindex))

    sends 3 parameters. And in pyparse.find_good_parse_start(), the signature allows 3. However, the signature is:

        def find_good_parse_start(self, is_char_in_string=None,
                                  _synchre=_synchre):

    This means that the False value in self.use_context_ps1 is the first value instead of the function, so pyparse is always executing:

        if not is_char_in_string:
            # no clue -- make the caller pass everything
            return None

    Here's the commit that changed the signature:
    b175445

    @csabella csabella added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 3, 2018
    @csabella csabella added topic-IDLE type-feature A feature request or enhancement labels Mar 3, 2018
    @terryjreedy
    Copy link
    Member

    To be clear, the signature that got changed in 2005 is the signature for find_good_parse_start ('fgps'), which was previously

    def find_good_parse_start(self, use_ps1, is_char_in_string=None,
     _synchre=_synchre)

    When the use_ps1 parameter was removed, the 'if use_ps1' code was moved to the else: branch of the new 'if not use_ps1: ... else: ' in the editor method, but the call in question, moved into the 'if not use_ps1' branch, was not changed. The immediate fix is to remove the extra argument.

    The similar call in the then new hyperparser module is correct.

     bod = parser.find_good_parse_start(
         editwin._build_char_in_string_func(startatindex))

    The erroneous call has not been detected in execution because of this bug:
    not is_char_in_string
    is too general. It should have been specifically
    is_char_in_string is None
    so False does not trigger the early return, but gets called, raising TypeError. We should add a test that find_good_parse_start(False, lambda: True) does so, with reference to this issue.

    Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function. So I think we can delete '=None' and the early return, rather than changing the early return condition.

    @terryjreedy
    Copy link
    Member

    If fgps never returns 0, then returning 0 instead of None would allow simplification of

                    if bod is not None or startat == 1:
                        break
                parser.set_lo(bod or 0)

    to
    if bod or startat == 1:
    break
    parser.set_lo(bod)

    If it can (or should) ever return 0, separate from None, I would like to see a test case for that. We could then think about whether or not the loop should break on 0 as well as None.

    Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis. It could then be tested in isolation. The method should return a parser instance ready for analysis.

    Both blocks have an explicit set_lo(0) call, which does nothing, and could be removed.

    @terryjreedy
    Copy link
    Member

    Since _synchre is never passed, it should not be a parameter either. I think we should either limit this to fixing the call, with no unit test added, or expand to 'fix find_good_parse_start and buggy call', with revised tests.

    It might be interesting to verify that the time it takes to find a better start point is less that the time saved in later analysis.

    @csabella
    Copy link
    Contributor Author

    csabella commented Mar 4, 2018

    I didn't incorporate all the suggestions into the first PR for this so that it could focus on the parameter change and the tests. newline_and_indent_event does a lot, so I want to make sure the tests look good. I was also going to add tests for the other indent methods (some are called from newline_and_indent_event), but, again, wanted to focus on the bug fix.

    Oddly enough, I don't know how to show in the tests that the bug fix makes any difference. Finding a "good parse start" is more about parsing faster, but, as the tests show from running all the tests with the context_use_ps1 set to True and False, the results are the same. Of course, with one or two lines of code, they would be because there isn't much text to parse and ignore. I did have some tests that had longer text to show that bod was being calculated, but I wasn't sure if I should include those for the main goal of testing the newline and indent.

    Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function. So I think we can delete '=None' and the early return, rather than changing the early return condition.

    I'd like to address the on another PR that focuses more on refactoring pyparse than this one.

    If fgps never returns 0

    It can return 0, separately from None, as some of the tests on this PR show.

    Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis. It could then be tested in isolation. The method should return a parser instance ready for analysis.

    I agree. The duplicated code is bugging me. :-)

    ---------
    One addtional note: I think that all the indent-related methods in EditorWindow can be factored out into their own class. It will help make EditorWindow a little more manageable.

    @terryjreedy
    Copy link
    Member

    I agree with limiting the scope to the None bug and the faulty call. However, we should think of the None fix as primary, and the new test thereof as the primary test. Fixing the None check exposes the call bug, which we also fix. I change the title here and on PR.

    As you noted, the new editor TestCase is not directly relevant to testing the double fix except to show that there is no change in indent. The way to do that is to pass None instead of the in-string function. I did that (temporarily!) and the test passes, meaning that the indents are the same. (I do however think some of them are dubious, and I want to mark those cases.)

    We could have made editor tests that initially failed by exposing bod as an instance attribute (as we have done before), and including a longer test case for which bod is a positive int. However, bod should remain local.

    As an alternative, for experimentation, I added print(bod). The values for the patch are 0, None, None, None, None, None, 0, 0, None. I added ' a\n' to 'Block opener - indents +1 level.' and changed the mark and the 5th 'None' became '4'.

    The fact that passing None and _build_char_in_string_func(startatindex) result in the same indents raises the question of whether the call has any benefit in reducing net time after the followup call to get_continuation_type(). Maybe tomorrow I will try to write a good timeit test.

    In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():' 812 is probably the beginning of the line. After "if __name__ == '__main__':" near the end, 1416. After the final "run(_editor_window)", 1654. The highest value I got in about 10 tries past the middle, 1931. To me, this is pathetically bad.

    I tried turning on CodeContext and got the same results where I tested. bod should just be the beginning of the last context line. I am not optimistic about timing results.

    @terryjreedy terryjreedy changed the title IDLE: Incorrect signature in call from editor to pyparse.find_good_parse_start IDLE: Fix pyparse.find_good_parse_start and its bad editor call Mar 4, 2018
    @csabella
    Copy link
    Contributor Author

    csabella commented Mar 5, 2018

    In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():' 812 is probably the beginning of the line. After "if __name__ == '__main__':" near the end, 1416. After the final "run(_editor_window)", 1654. The highest value I got in about 10 tries past the middle, 1931. To me, this is pathetically bad.

    Print startat too. num_context_lines isn't CodeContext (although I thought it was too); that's just unfortunate naming. num_context_lines is just a list = [50, 500, 5000]. First it looks at the code going back 50 lines from the current line and it only sends the text to find_good_parse_start() from this line onward. bod is calculated from that point, not from the beginning of the file. Because the start point is always 50 lines back, bod seems to always be in a similar range once you get to line 50 or higher in the code.

    It seems that the purpose of the parsing is to apply the translate, etc to as few lines as possible. So, it tries to make sure it includes the openers (':' ending lines) and closers (return, pass, etc) and the beginning of the brackets and continuation lines. The big thing is that it wants to make sure it's not in a string or comment. So, I think the program almost overcompensates for the idea of a 'large string'. It is very complex and very hard to figure out exactly what it is trying to accomplish, even with the comments. Maybe modern computing power (compared to 2000) has made it such that translating a whole source file is quick enough to not need fancy parsing. :-)

    @terryjreedy
    Copy link
    Member

    After updating the patch, I noticed that deletion of self.context_use_ps1 from the editor call to y.find_good_parse_start was no longer part of the diff. This is because 3.8 changeset 6bdc4de for bpo-35610 (backported only to 3.7) already did so. The remaining substantive change other than new tests is the removal of the unused default value None in the signature of the function and unused code triggered by 'if not arg'.

    I don't remember how well I tested that never exiting find...start early left indentations unchanged. This would require comparing behavior to 3.6. It would make more sense to check that the behavior and new tests conform to PEP-8.

    @terryjreedy terryjreedy changed the title IDLE: Fix pyparse.find_good_parse_start and its bad editor call IDLE: Fix pyparse.find_good_parse_start Jan 21, 2020
    @terryjreedy
    Copy link
    Member

    New changeset ec64640 by Terry Jan Reedy (Cheryl Sabella) in branch 'master':
    bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
    ec64640

    @miss-islington
    Copy link
    Contributor

    New changeset f3d3a3c by Miss Islington (bot) in branch '3.7':
    bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
    f3d3a3c

    @miss-islington
    Copy link
    Contributor

    New changeset 060ad2f by Miss Islington (bot) in branch '3.8':
    bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
    060ad2f

    @terryjreedy
    Copy link
    Member

    Cheryl, sorry to take so long. I will look at and probably remove the _synchre parameter, on this issue. Feel free to pursue any of the other possible followups.

    @terryjreedy terryjreedy added the 3.9 only security fixes label Jan 23, 2020
    @terryjreedy
    Copy link
    Member

    Tim, idlelib.pyparse has this definition:

    # Find what looks like the start of a popular statement.
    _synchre = re.compile(r"""
        ^
        [ \t]*
        (?: while
        |   else
        |   def
        |   return
        |   assert
        |   break
        |   class
        |   continue
        |   elif
        |   try
        |   except
        |   raise
        |   import
        |   yield
        )
        \b
    """, re.VERBOSE | re.MULTILINE).search

    You are credited with adding 'yield' to David Sherer's original list:
    "Taught IDLE's autoident parser that "yield" is a keyword that begins
    a stmt." --tim_one (found via git blame)

    Do you know if there is any reason to not add 'if', 'for', and now 'with'?

    @terryjreedy
    Copy link
    Member

    New changeset f9e07e1 by Terry Jan Reedy in branch 'master':
    bpo-32989: IDLE - remove unneeded parameter (GH-18138)
    f9e07e1

    @miss-islington
    Copy link
    Contributor

    New changeset 36968c1 by Miss Islington (bot) in branch '3.7':
    bpo-32989: IDLE - remove unneeded parameter (GH-18138)
    36968c1

    @miss-islington
    Copy link
    Contributor

    New changeset 545fc51 by Miss Islington (bot) in branch '3.8':
    bpo-32989: IDLE - remove unneeded parameter (GH-18138)
    545fc51

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-IDLE type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants