classification
Title: IDLE: Fix pyparse.find_good_parse_start
Type: enhancement Stage: resolved
Components: IDLE Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: cheryl.sabella, miss-islington, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2018-03-03 00:44 by cheryl.sabella, last changed 2020-01-23 07:11 by terry.reedy. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5968 merged cheryl.sabella, 2018-03-03 23:39
PR 18096 merged miss-islington, 2020-01-21 10:11
PR 18097 merged miss-islington, 2020-01-21 10:12
PR 18138 merged terry.reedy, 2020-01-23 04:34
PR 18139 merged miss-islington, 2020-01-23 04:55
PR 18140 merged miss-islington, 2020-01-23 04:55
Messages (16)
msg313170 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-03-03 00:44
From msg312726 on issue32880.

The call to find_good_parse_start:

    bod = y.find_good_parse_start(self.context_use_ps1, 
                         self._build_char_in_string_func(startatindex))

sends 3 parameters.  And in pyparse.find_good_parse_start(), the signature allows 3.  However, the signature is:

    def find_good_parse_start(self, is_char_in_string=None,
                              _synchre=_synchre):


This means that the `False` value in `self.use_context_ps1` is the first value instead of the function, so pyparse is always executing:

    if not is_char_in_string:
        # no clue -- make the caller pass everything
        return None


Here's the commit that changed the signature:
https://github.com/python/cpython/commit/b17544551fc8dfd1304d5679c6e444cad4d34d97
msg313178 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-03-03 03:56
To be clear, the signature that got changed in 2005 is the signature for find_good_parse_start ('fgps'), which was previously

def find_good_parse_start(self, use_ps1, is_char_in_string=None,
 _synchre=_synchre)

When the use_ps1 parameter was removed, the 'if use_ps1' code was moved to the else: branch of the new 'if not use_ps1: ... else: ' in the editor method, but the call in question, moved into the 'if not use_ps1' branch, was not changed.  The immediate fix is to remove the extra argument.

The similar call in the then new hyperparser module is correct.

 bod = parser.find_good_parse_start(
     editwin._build_char_in_string_func(startatindex))

The erroneous call has not been detected in execution because of this bug:
  not is_char_in_string
is too general.  It should have been specifically
  is_char_in_string is None
so False does not trigger the early return, but gets called, raising TypeError.  We should add a test that find_good_parse_start(False, lambda: True) does so, with reference to this issue.

Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function.  So I think we can delete '=None' and the early return, rather than changing the early return condition.
msg313179 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-03-03 04:15
If fgps never returns 0, then returning 0 instead of None would allow simplification of

                if bod is not None or startat == 1:
                    break
            parser.set_lo(bod or 0)

to
                if bod or startat == 1:
                    break
            parser.set_lo(bod)

If it can (or should) ever return 0, separate from None, I would like to see a test case for that.  We could then think about whether or not the loop should break on 0 as well as None.

Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis.  It could then be tested in isolation.  The method should return a parser instance ready for analysis.

Both blocks have an explicit set_lo(0) call, which does nothing, and could be removed.
msg313180 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-03-03 04:27
Since _synchre is never passed, it should not be a parameter either.  I think we should either limit this to fixing the call, with no unit test added, or expand to 'fix find_good_parse_start and buggy call', with revised tests.

It might be interesting to verify that the time it takes to find a better start point is less that the time saved in later analysis.
msg313195 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-03-04 00:01
I didn't incorporate all the suggestions into the first PR for this so that it could focus on the parameter change and the tests.  `newline_and_indent_event` does a lot, so I want to make sure the tests look good.  I was also going to add tests for the other indent methods (some are called from `newline_and_indent_event`), but, again, wanted to focus on the bug fix.

Oddly enough, I don't know how to show in the tests that the bug fix makes any difference.  Finding a "good parse start" is more about parsing faster, but, as the tests show from running all the tests with the `context_use_ps1` set to True and False, the results are the same.  Of course, with one or two lines of code, they would be because there isn't much text to parse and ignore.  I did have some tests that had longer text to show that `bod` was being calculated, but I wasn't sure if I should include those for the main goal of testing  the newline and indent.

>  Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function.  So I think we can delete '=None' and the early return, rather than changing the early return condition.

I'd like to address the on another PR that focuses more on refactoring pyparse than this one.

> If fgps never returns 0

It can return 0, separately from None, as some of the tests on this PR show.

> Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis.  It could then be tested in isolation.  The method should return a parser instance ready for analysis.

I agree.  The duplicated code is bugging me.  :-)

---------
One addtional note:  I think that all the indent-related methods in EditorWindow can be factored out into their own class.   It will help make EditorWindow a little more manageable.
msg313205 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-03-04 08:38
I agree with limiting the scope to the None bug and the faulty call.  However, we should think of the None fix as primary, and the new test thereof as the primary test.  Fixing the None check exposes the call bug, which we also fix.  I change the title here and on PR.

As you noted, the new editor TestCase is not directly relevant to testing the double fix except to show that there is no change in indent.  The way to do that is to pass None instead of the in-string function.  I did that (temporarily!) and the test passes, meaning that the indents are the same.  (I do however think some of them are dubious, and I want to mark those cases.)

We could have made editor tests that initially failed by exposing bod as an instance attribute (as we have done before), and including a longer test case for which bod is a positive int.  However, bod should remain local.

As an alternative, for experimentation, I added print(bod).  The values for the patch are 0, None, None, None, None, None, 0, 0, None.  I added '  a\n' to 'Block opener - indents +1 level.' and changed the mark and the 5th 'None' became '4'.

The fact that passing None and _build_char_in_string_func(startatindex) result in the same indents raises the question of whether the call has any benefit in reducing net time after the followup call to get_continuation_type().  Maybe tomorrow I will try to write a good timeit test.

In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():'  812 is probably the beginning of the line.  After "if __name__ == '__main__':" near the end, 1416.  After the final "run(_editor_window)", 1654.  The highest value I got in about 10 tries past the middle, 1931.  To me, this is pathetically bad.

I tried turning on CodeContext and got the same results where I tested.  bod should just be the beginning of the last context line.  I am not optimistic about timing results.
msg313231 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-03-05 00:34
> In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():'  812 is probably the beginning of the line.  After "if __name__ == '__main__':" near the end, 1416.  After the final "run(_editor_window)", 1654.  The highest value I got in about 10 tries past the middle, 1931.  To me, this is pathetically bad.

Print `startat` too.  `num_context_lines` isn't CodeContext (although I thought it was too); that's just unfortunate naming.  `num_context_lines` is just a list = [50, 500, 5000]. First it looks at the code going back 50 lines from the current line and it only sends the text to `find_good_parse_start()` from this line onward.  `bod` is calculated from that point, not from the beginning of the file.  Because the start point is always 50 lines back, `bod` seems to always be in a similar range once you get to line 50 or higher in the code.

It seems that the purpose of the parsing is to apply the translate, etc to as few lines as possible.  So, it tries to make sure it includes the openers (':' ending lines) and closers (return, pass, etc) and the beginning of the brackets and continuation lines.  The big thing is that it wants to make sure it's not in a string or comment.  So, I think the program almost overcompensates for the idea of a 'large string'.  It is very complex and very hard to figure out exactly what it is trying to accomplish, even with the comments.  Maybe modern computing power (compared to 2000) has made it such that translating a whole source file is quick enough to not need fancy parsing.  :-)
msg360361 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-21 07:53
After updating the patch, I noticed that deletion of self.context_use_ps1 from the editor call to y.find_good_parse_start was no longer part of the diff.  This is because 3.8 changeset 6bdc4dee01788599808c7858e2fe9fdd72cf6792 for #35610 (backported only to 3.7) already did so. The remaining substantive change other than new tests is the removal of the unused default value None in the signature of the function and unused code triggered by 'if not arg'. 

I don't remember how well I tested that never exiting find...start early left indentations unchanged.  This would require comparing behavior to 3.6.  It would make more sense to check that the behavior and new tests conform to PEP8.
msg360380 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-21 10:11
New changeset ec64640a2c5236d7a5d5470d759172a3d93eab0b by Terry Jan Reedy (Cheryl Sabella) in branch 'master':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
https://github.com/python/cpython/commit/ec64640a2c5236d7a5d5470d759172a3d93eab0b
msg360384 - (view) Author: miss-islington (miss-islington) Date: 2020-01-21 10:28
New changeset f3d3a3cc114ed30829544c3613b73e4fa6dd5599 by Miss Islington (bot) in branch '3.7':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
https://github.com/python/cpython/commit/f3d3a3cc114ed30829544c3613b73e4fa6dd5599
msg360385 - (view) Author: miss-islington (miss-islington) Date: 2020-01-21 10:29
New changeset 060ad2fc1535adc76f96be8269b4af0f14429161 by Miss Islington (bot) in branch '3.8':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
https://github.com/python/cpython/commit/060ad2fc1535adc76f96be8269b4af0f14429161
msg360539 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-23 04:20
Cheryl, sorry to take so long.  I will look at and probably remove the _synchre parameter, on this issue.  Feel free to pursue any of the other possible followups.
msg360540 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-23 04:30
Tim, idlelib.pyparse has this definition:

# Find what looks like the start of a popular statement.
_synchre = re.compile(r"""
    ^
    [ \t]*
    (?: while
    |   else
    |   def
    |   return
    |   assert
    |   break
    |   class
    |   continue
    |   elif
    |   try
    |   except
    |   raise
    |   import
    |   yield
    )
    \b
""", re.VERBOSE | re.MULTILINE).search

You are credited with adding 'yield' to David Sherer's original list:
  "Taught IDLE's autoident parser that "yield" is a keyword that begins
   a stmt." --tim_one (found via git blame)

Do you know if there is any reason to not add 'if', 'for', and now 'with'?
msg360542 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-01-23 04:55
New changeset f9e07e116c32b6dc4561d0bdeb452ccde13b0e7c by Terry Jan Reedy in branch 'master':
bpo-32989: IDLE - remove unneeded parameter  (GH-18138)
https://github.com/python/cpython/commit/f9e07e116c32b6dc4561d0bdeb452ccde13b0e7c
msg360543 - (view) Author: miss-islington (miss-islington) Date: 2020-01-23 05:13
New changeset 36968c13cb9800559dbb90686933da7daf52c788 by Miss Islington (bot) in branch '3.7':
bpo-32989: IDLE - remove unneeded parameter  (GH-18138)
https://github.com/python/cpython/commit/36968c13cb9800559dbb90686933da7daf52c788
msg360544 - (view) Author: miss-islington (miss-islington) Date: 2020-01-23 05:13
New changeset 545fc51d950558ecec9ff64cb2f9c11469051524 by Miss Islington (bot) in branch '3.8':
bpo-32989: IDLE - remove unneeded parameter  (GH-18138)
https://github.com/python/cpython/commit/545fc51d950558ecec9ff64cb2f9c11469051524
History
Date User Action Args
2020-01-23 07:11:51terry.reedysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-01-23 05:13:44miss-islingtonsetmessages: + msg360544
2020-01-23 05:13:04miss-islingtonsetmessages: + msg360543
2020-01-23 04:55:26miss-islingtonsetpull_requests: + pull_request17526
2020-01-23 04:55:20miss-islingtonsetpull_requests: + pull_request17525
2020-01-23 04:55:11terry.reedysetmessages: + msg360542
2020-01-23 04:34:06terry.reedysetpull_requests: + pull_request17524
2020-01-23 04:30:00terry.reedysetnosy: + tim.peters
messages: + msg360540
2020-01-23 04:20:01terry.reedysetmessages: + msg360539
versions: + Python 3.9, - Python 3.6
2020-01-21 10:29:42miss-islingtonsetmessages: + msg360385
2020-01-21 10:28:51miss-islingtonsetnosy: + miss-islington
messages: + msg360384
2020-01-21 10:12:01miss-islingtonsetpull_requests: + pull_request17487
2020-01-21 10:11:53miss-islingtonsetstage: test needed -> patch review
pull_requests: + pull_request17486
2020-01-21 10:11:37terry.reedysetmessages: + msg360380
2020-01-21 07:53:20terry.reedysetmessages: + msg360361
title: IDLE: Fix pyparse.find_good_parse_start and its bad editor call -> IDLE: Fix pyparse.find_good_parse_start
2018-12-29 06:00:16terry.reedyunlinkissue34055 dependencies
2018-12-25 21:25:16cheryl.sabellalinkissue34055 dependencies
2018-03-05 00:34:04cheryl.sabellasetmessages: + msg313231
2018-03-04 08:38:05terry.reedysetmessages: + msg313205
title: IDLE: Incorrect signature in call from editor to pyparse.find_good_parse_start -> IDLE: Fix pyparse.find_good_parse_start and its bad editor call
2018-03-04 00:01:48cheryl.sabellasetmessages: + msg313195
stage: patch review -> test needed
2018-03-03 23:39:13cheryl.sabellasetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request5734
2018-03-03 04:27:05terry.reedysetmessages: + msg313180
2018-03-03 04:15:02terry.reedysetmessages: + msg313179
2018-03-03 03:56:57terry.reedysetmessages: + msg313178
stage: test needed
2018-03-03 00:44:56cheryl.sabellalinkissue32880 dependencies
2018-03-03 00:44:36cheryl.sabellacreate