IDLE: Fix pyparse.find_good_parse_start #77170

csabella · 2018-03-03T00:44:37Z

BPO	32989
Nosy	@tim-one, @terryjreedy, @csabella, @miss-islington
PRs	bpo-32989: IDLE - fix bad editor call of pyparse method #5968 [3.8] bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968) #18096 [3.7] bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968) #18097 bpo-32989: IDLE - remove unneeded parameter #18138 [3.8] bpo-32989: IDLE - remove unneeded parameter (GH-18138) #18139 [3.7] bpo-32989: IDLE - remove unneeded parameter (GH-18138) #18140

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/terryjreedy'
closed_at = <Date 2020-01-23.07:11:51.918>
created_at = <Date 2018-03-03.00:44:36.519>
labels = ['3.8', 'expert-IDLE', 'type-feature', '3.7', '3.9']
title = 'IDLE: Fix pyparse.find_good_parse_start'
updated_at = <Date 2020-01-23.07:11:51.917>
user = 'https://github.com/csabella'

bugs.python.org fields:

activity = <Date 2020-01-23.07:11:51.917>
actor = 'terry.reedy'
assignee = 'terry.reedy'
closed = True
closed_date = <Date 2020-01-23.07:11:51.918>
closer = 'terry.reedy'
components = ['IDLE']
creation = <Date 2018-03-03.00:44:36.519>
creator = 'cheryl.sabella'
dependencies = []
files = []
hgrepos = []
issue_num = 32989
keywords = ['patch']
message_count = 16.0
messages = ['313170', '313178', '313179', '313180', '313195', '313205', '313231', '360361', '360380', '360384', '360385', '360539', '360540', '360542', '360543', '360544']
nosy_count = 4.0
nosy_names = ['tim.peters', 'terry.reedy', 'cheryl.sabella', 'miss-islington']
pr_nums = ['5968', '18096', '18097', '18138', '18139', '18140']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue32989'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

csabella · 2018-03-03T00:44:36Z

From msg312726 on bpo-32880.

The call to find_good_parse_start:

    bod = y.find_good_parse_start(self.context_use_ps1, 
                         self._build_char_in_string_func(startatindex))

sends 3 parameters. And in pyparse.find_good_parse_start(), the signature allows 3. However, the signature is:

    def find_good_parse_start(self, is_char_in_string=None,
                              _synchre=_synchre):

This means that the False value in self.use_context_ps1 is the first value instead of the function, so pyparse is always executing:

    if not is_char_in_string:
        # no clue -- make the caller pass everything
        return None

Here's the commit that changed the signature:
b175445

terryjreedy · 2018-03-03T03:56:57Z

To be clear, the signature that got changed in 2005 is the signature for find_good_parse_start ('fgps'), which was previously

def find_good_parse_start(self, use_ps1, is_char_in_string=None,
 _synchre=_synchre)

When the use_ps1 parameter was removed, the 'if use_ps1' code was moved to the else: branch of the new 'if not use_ps1: ... else: ' in the editor method, but the call in question, moved into the 'if not use_ps1' branch, was not changed. The immediate fix is to remove the extra argument.

The similar call in the then new hyperparser module is correct.

 bod = parser.find_good_parse_start(
     editwin._build_char_in_string_func(startatindex))

The erroneous call has not been detected in execution because of this bug:
not is_char_in_string
is too general. It should have been specifically
is_char_in_string is None
so False does not trigger the early return, but gets called, raising TypeError. We should add a test that find_good_parse_start(False, lambda: True) does so, with reference to this issue.

Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function. So I think we can delete '=None' and the early return, rather than changing the early return condition.

terryjreedy · 2018-03-03T04:15:02Z

If fgps never returns 0, then returning 0 instead of None would allow simplification of

                if bod is not None or startat == 1:
                    break
            parser.set_lo(bod or 0)

to
if bod or startat == 1:
break
parser.set_lo(bod)

If it can (or should) ever return 0, separate from None, I would like to see a test case for that. We could then think about whether or not the loop should break on 0 as well as None.

Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis. It could then be tested in isolation. The method should return a parser instance ready for analysis.

Both blocks have an explicit set_lo(0) call, which does nothing, and could be removed.

terryjreedy · 2018-03-03T04:27:05Z

Since _synchre is never passed, it should not be a parameter either. I think we should either limit this to fixing the call, with no unit test added, or expand to 'fix find_good_parse_start and buggy call', with revised tests.

It might be interesting to verify that the time it takes to find a better start point is less that the time saved in later analysis.

csabella · 2018-03-04T00:01:48Z

I didn't incorporate all the suggestions into the first PR for this so that it could focus on the parameter change and the tests. newline_and_indent_event does a lot, so I want to make sure the tests look good. I was also going to add tests for the other indent methods (some are called from newline_and_indent_event), but, again, wanted to focus on the bug fix.

Oddly enough, I don't know how to show in the tests that the bug fix makes any difference. Finding a "good parse start" is more about parsing faster, but, as the tests show from running all the tests with the context_use_ps1 set to True and False, the results are the same. Of course, with one or two lines of code, they would be because there isn't much text to parse and ignore. I did have some tests that had longer text to show that bod was being calculated, but I wasn't sure if I should include those for the main goal of testing the newline and indent.

Both calls to fgps (editor and hyperparser) pass _build_char_in_string_func(initial_start), which unconditionally returns a function. So I think we can delete '=None' and the early return, rather than changing the early return condition.

I'd like to address the on another PR that focuses more on refactoring pyparse than this one.

If fgps never returns 0

It can return 0, separately from None, as some of the tests on this PR show.

Perhaps separate issue: the 'if use_ps1' statements in editor and hyperparser, and a couple of lines before, is nearly identical, and could be factored into a separate editor method that returns a parser instance ready for analysis. It could then be tested in isolation. The method should return a parser instance ready for analysis.

I agree. The duplicated code is bugging me. :-)

---------
One addtional note: I think that all the indent-related methods in EditorWindow can be factored out into their own class. It will help make EditorWindow a little more manageable.

terryjreedy · 2018-03-04T08:38:04Z

I agree with limiting the scope to the None bug and the faulty call. However, we should think of the None fix as primary, and the new test thereof as the primary test. Fixing the None check exposes the call bug, which we also fix. I change the title here and on PR.

As you noted, the new editor TestCase is not directly relevant to testing the double fix except to show that there is no change in indent. The way to do that is to pass None instead of the in-string function. I did that (temporarily!) and the test passes, meaning that the indents are the same. (I do however think some of them are dubious, and I want to mark those cases.)

We could have made editor tests that initially failed by exposing bod as an instance attribute (as we have done before), and including a longer test case for which bod is a positive int. However, bod should remain local.

As an alternative, for experimentation, I added print(bod). The values for the patch are 0, None, None, None, None, None, 0, 0, None. I added ' a\n' to 'Block opener - indents +1 level.' and changed the mark and the 5th 'None' became '4'.

The fact that passing None and _build_char_in_string_func(startatindex) result in the same indents raises the question of whether the call has any benefit in reducing net time after the followup call to get_continuation_type(). Maybe tomorrow I will try to write a good timeit test.

In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():' 812 is probably the beginning of the line. After "if __name__ == '__main__':" near the end, 1416. After the final "run(_editor_window)", 1654. The highest value I got in about 10 tries past the middle, 1931. To me, this is pathetically bad.

I tried turning on CodeContext and got the same results where I tested. bod should just be the beginning of the last context line. I am not optimistic about timing results.

csabella · 2018-03-05T00:34:04Z

In the meanwhile, to get some idea of how well find_good_parse_start finds good parse starts, I restarted IDLE in a console with the print still added, loaded editor.py, and hit RETURN followed by UNDO, in various places. The first non-zero bod, 812, comes with the cursor at the end of 'def _sphinx_version():' 812 is probably the beginning of the line. After "if __name__ == '__main__':" near the end, 1416. After the final "run(_editor_window)", 1654. The highest value I got in about 10 tries past the middle, 1931. To me, this is pathetically bad.

Print startat too. num_context_lines isn't CodeContext (although I thought it was too); that's just unfortunate naming. num_context_lines is just a list = [50, 500, 5000]. First it looks at the code going back 50 lines from the current line and it only sends the text to find_good_parse_start() from this line onward. bod is calculated from that point, not from the beginning of the file. Because the start point is always 50 lines back, bod seems to always be in a similar range once you get to line 50 or higher in the code.

It seems that the purpose of the parsing is to apply the translate, etc to as few lines as possible. So, it tries to make sure it includes the openers (':' ending lines) and closers (return, pass, etc) and the beginning of the brackets and continuation lines. The big thing is that it wants to make sure it's not in a string or comment. So, I think the program almost overcompensates for the idea of a 'large string'. It is very complex and very hard to figure out exactly what it is trying to accomplish, even with the comments. Maybe modern computing power (compared to 2000) has made it such that translating a whole source file is quick enough to not need fancy parsing. :-)

terryjreedy · 2020-01-21T07:53:21Z

After updating the patch, I noticed that deletion of self.context_use_ps1 from the editor call to y.find_good_parse_start was no longer part of the diff. This is because 3.8 changeset 6bdc4de for bpo-35610 (backported only to 3.7) already did so. The remaining substantive change other than new tests is the removal of the unused default value None in the signature of the function and unused code triggered by 'if not arg'.

I don't remember how well I tested that never exiting find...start early left indentations unchanged. This would require comparing behavior to 3.6. It would make more sense to check that the behavior and new tests conform to PEP-8.

terryjreedy · 2020-01-21T10:11:38Z

New changeset ec64640 by Terry Jan Reedy (Cheryl Sabella) in branch 'master':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
ec64640

miss-islington · 2020-01-21T10:28:52Z

New changeset f3d3a3c by Miss Islington (bot) in branch '3.7':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
f3d3a3c

miss-islington · 2020-01-21T10:29:43Z

New changeset 060ad2f by Miss Islington (bot) in branch '3.8':
bpo-32989: IDLE - fix bad editor call of pyparse method (GH-5968)
060ad2f

terryjreedy · 2020-01-23T04:20:02Z

Cheryl, sorry to take so long. I will look at and probably remove the _synchre parameter, on this issue. Feel free to pursue any of the other possible followups.

terryjreedy · 2020-01-23T04:30:00Z

Tim, idlelib.pyparse has this definition:

# Find what looks like the start of a popular statement.
_synchre = re.compile(r"""
    ^
    [ \t]*
    (?: while
    |   else
    |   def
    |   return
    |   assert
    |   break
    |   class
    |   continue
    |   elif
    |   try
    |   except
    |   raise
    |   import
    |   yield
    )
    \b
""", re.VERBOSE | re.MULTILINE).search

You are credited with adding 'yield' to David Sherer's original list:
"Taught IDLE's autoident parser that "yield" is a keyword that begins
a stmt." --tim_one (found via git blame)

Do you know if there is any reason to not add 'if', 'for', and now 'with'?

terryjreedy · 2020-01-23T04:55:12Z

New changeset f9e07e1 by Terry Jan Reedy in branch 'master':
bpo-32989: IDLE - remove unneeded parameter (GH-18138)
f9e07e1

miss-islington · 2020-01-23T05:13:04Z

New changeset 36968c1 by Miss Islington (bot) in branch '3.7':
bpo-32989: IDLE - remove unneeded parameter (GH-18138)
36968c1

miss-islington · 2020-01-23T05:13:45Z

New changeset 545fc51 by Miss Islington (bot) in branch '3.8':
bpo-32989: IDLE - remove unneeded parameter (GH-18138)
545fc51

csabella added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 3, 2018

csabella assigned terryjreedy Mar 3, 2018

csabella added topic-IDLE type-feature A feature request or enhancement labels Mar 3, 2018

terryjreedy changed the title ~~IDLE: Incorrect signature in call from editor to pyparse.find_good_parse_start~~ IDLE: Fix pyparse.find_good_parse_start and its bad editor call Mar 4, 2018

terryjreedy changed the title ~~IDLE: Fix pyparse.find_good_parse_start and its bad editor call~~ IDLE: Fix pyparse.find_good_parse_start Jan 21, 2020

terryjreedy added the 3.9 only security fixes label Jan 23, 2020

terryjreedy closed this as completed Jan 23, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDLE: Fix pyparse.find_good_parse_start #77170

IDLE: Fix pyparse.find_good_parse_start #77170

csabella commented Mar 3, 2018

csabella commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

csabella commented Mar 4, 2018

terryjreedy commented Mar 4, 2018

csabella commented Mar 5, 2018

terryjreedy commented Jan 21, 2020

terryjreedy commented Jan 21, 2020

miss-islington commented Jan 21, 2020

miss-islington commented Jan 21, 2020

terryjreedy commented Jan 23, 2020

terryjreedy commented Jan 23, 2020

terryjreedy commented Jan 23, 2020

miss-islington commented Jan 23, 2020

miss-islington commented Jan 23, 2020

IDLE: Fix pyparse.find_good_parse_start #77170

IDLE: Fix pyparse.find_good_parse_start #77170

Comments

csabella commented Mar 3, 2018

csabella commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

terryjreedy commented Mar 3, 2018

csabella commented Mar 4, 2018

terryjreedy commented Mar 4, 2018

csabella commented Mar 5, 2018

terryjreedy commented Jan 21, 2020

terryjreedy commented Jan 21, 2020

miss-islington commented Jan 21, 2020

miss-islington commented Jan 21, 2020

terryjreedy commented Jan 23, 2020

terryjreedy commented Jan 23, 2020

terryjreedy commented Jan 23, 2020

miss-islington commented Jan 23, 2020

miss-islington commented Jan 23, 2020