Title: IDLE fails to detect corresponding opening parenthesis
Type: behavior Stage: test needed
Components: IDLE Versions: Python 3.10, Python 3.9, Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: Alexey Burdin, Lewis Ball, taleinat, terry.reedy
Priority: normal Keywords:

Created on 2020-07-24 19:10 by Alexey Burdin, last changed 2020-07-26 23:35 by Lewis Ball.

Messages (9)
msg374205 - (view) Author: Alexey Burdin (Alexey Burdin) Date: 2020-07-24 19:10
    set(j for i in data['items'] for j in i),
    key=cmp_to_key(lambda x,y:(
        -1 if (x,y) in answer_order 
        else (0 if x==y else 1)))
when the cursor is placed in line 5 col 31 (between `)` and `))` ) hitting Ctrl-0 (Show surrounding parens) produces an error sound, though there's no error, the script works fine.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic
$ uname -a
Linux odd-one 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ python3.8
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.util.find_spec('idlelib.pyshell')
ModuleSpec(name='idlelib.pyshell', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f1ea8ee14c0>, origin='/usr/lib/python3.8/idlelib/')
>>> exit()
$ dpkg-query -S /usr/lib/python3.8/idlelib/
idle-python3.8: /usr/lib/python3.8/idlelib/
$ dpkg -l | grep idle-python3\.8
ii  idle-python3.8                             3.8.0-3~18.04                                    all          IDE for Python (v3.8) using Tkinter
msg374218 - (view) Author: Lewis Ball (Lewis Ball) * Date: 2020-07-25 00:37
A minimal example of the same issue seems to be:
(1 if x
 else 0)
In this case the parentheses matching also fails. It only ever seems to fail when `else` is the first word on a newline inside brackets, and adding the line break elsewhere in the above statement will not cause the matching to fail. 

I imagine it stops looking for the matching parentheses when it gets to the `else` statement, as an `else` when part of an `if/else` cannot be inside parentheses. Obviously the exception to this is the ternary operator.

Bizarrely, it only seems to fail the first time it encounters this issue. Any subsequent code with the same pattern seems to be matched fine. So in the following example, the second brackets get matched without an issue:

(1 if x
 else 0)

(1 if x
 else 0)

Hopefully someone more familiar with the source code can shed some light on this!
msg374224 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-07-25 02:59
(While I wrote the following, Lewis Ball posted some of the same conclusions.)

0. I verified the failure with 3.8.5 and current master.

1. I reduced the failing example to a minimum of 7 chars on 2 lines.

1a. Deleting '\n' or any letter of 'else' fixes the problem.
1b. Adding 'else' before '(' fixes the problem.
1a is true for the initial 'else' on line 4 of the original example.
1b is true for 'else' on a new line before the first.

2. I tried multiple variations
2a. Matching also fails when typing ')'.
2b. Replacing '()' with '[]' or '{}' makes no difference.

2c. Replacing 'else' with 'elif', 'while', 'def', or 'class' makes no difference.  This include the addition for 1b above.  Replacing 'else' with 'if', 'for', or 'with' fixes the bug.

Bingo!!!  Matching uses idlelib.hyperparser, which uses idlelib.pyparse, which has regex _synchre, which finds "what looks like the start of a popular statement".  It matches 'elif', etcetera, but not 'if', etcetera.  I noticed previously that these 'popular' line beginners were missing and have planned to add them.*  But we need to make matched words not disable fence matching.

* Keyword 'with' was likely added to python after this re was last revised.  Keywords 'if' and 'for' might have been removed when comprehensions were added. If so, 'else' should have been removed when conditional expressions were added and 'yield' when yield expressions were added.  These latter two are the only keywords matched by _synchre that can properly appear in the middle of a statement but at the beginning of a continuation line.

2d. In the original and minimal examples, moving the offending 'else' to the end of the previous line fixes the problem.  So I believe that 'else' at the beginning of the line is seen marking the beginning of the statememt, and matching does not look back further.  If 'else' and 'yield' were left in _synchre and 'if' and 'for' were added back, the matcher would have to determine whether the line is a continuation or not.  (And I believe that not having 'if' and 'for' degrades the use of pyparse at least for other uses.)

2e. The minimal failing example works in the shell.  Why are _synchre keywords not poisonous here?  Is it just because the current statement is known to begin just after the '>>> ' prompt, so that else beginning the second line is somehow not seen as beginning a new statement?  We need to check if and how fence matching in shell is different other than the presence of the marker where code input begins.

Alexey: in msg371076 of #21756 I claimed that there is no issue with typing closers or ^0 on the same line just before the closer.  Thank you for reporting this counter-example.  I imagine that you are not the first IDLE user to write a conditional expression on 2 or more lines with else at the beginning of a line.  Such expressions always need a closer.
msg374225 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-07-25 03:16
Without rereading hyperparser and pyparse, I don't understand why a previous _synchre match prevents subsequent failures.  It does suggest that when matching parentheses, a fix would be to temporarily prepend 'else\n' to the text, or otherwise simulating the effect of that.

But the previous 'else' might only work before the 'current' statement.  And we should determine whether any other uses of these modules have related bugs.  Also, removing 'else' from the _synchre regex did not fix the issue, so there is something else I am missing.
msg374226 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-07-25 03:19
Lewis, removing you was an accident.  Your post is correct.
msg374274 - (view) Author: Lewis Ball (Lewis Ball) * Date: 2020-07-25 18:30
So it looks like `pyparse.Parser.find_good_parse_start` is responsible for truncating the code to only look for the matching bracket in the current statement, which uses _synchre.

Testing it out, it sometimes will go to the most recent match of _synchre, but will sometimes go back to an even earlier match in the code, which is why something like
manages to match without an issue. The `find_good_parse_start` will truncate at the first `else` in this case, instead at the second one.

Removing `else` from the _synchre regex did solve this problem for me though, as `find_good_parse_start` will then try to truncate even earlier when looking for the start of the statement, which will be before the opener. Although, I haven't checked what else would be affected by this change. I am not sure why this worked for me and did not work for you Terry.

Also, even with this fix it seems like 'find_good_parse_start` goes back further than it needs to. Fixing that may not change much, but would offer a slight performance increase.
msg374298 - (view) Author: Lewis Ball (Lewis Ball) * Date: 2020-07-26 00:05
Okay, on further examination `find_good_parse_start` first looks for a line ending with ':\n', and then looks to match _synchre on that line. If it can't find any line ending in ':\n' then it will just look for the first occurrence of something in _synchre to truncate on. If it can't find anything at that point, it just doesn't truncate and the whole code is checked for a bracket match.

So the reason that 
works is because there is no `:` after the first else, and adding `:` will cause the matching to fail.

This seems reasonable, and I was probably too quick to say that the function goes back further than it needs to when looking to truncate.

It seems like then that `else` being removed from _synchre will fix this, and mean that the only time brackets aren't matched are when invalid code is typed, something like:

I can put in a PR for this tomorrow (probably removing `yield` at the same time) if this sounds like the right fix.

P.S. Here is an example of the similar `yield` error that Terry alluded to:
def f():
    for i in range(10):
        yield x)
msg374301 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-07-26 04:43
As I at least hinted above, I would rather add the missing line starts than delete more.  I am quite sure that their absence degrades overall performance in some sense.  A much bigger match problem that this one is that ^0 always half fails when the cursor is not on the line with the closer.  See #21756, which already has a patch, though not one I was completely happy with last time I looked.

I cannot help but think that a proper solution might fix both issues.

I once wrote a fence matcher in C, initially for C, based on a deterministic finite state machine but with a push-down stack so it could match indefinite nesting.  I am thinking that I should try to rewrite in python and see if it will solve both issues.  Find good parse start is not really the right tool, because what we want to do with ^0 is to move both back and forward, find the first unmatched fence in each direction, and flash them, and add a beep if unmatched.  BOF and EOF (begin/end of file) are the backup fences.

Another possible solution might be to use a depleted _synchre for matching and an augmented one for any other use where that is better.
msg374350 - (view) Author: Lewis Ball (Lewis Ball) * Date: 2020-07-26 23:35
Okay that makes sense. Removing things from _synchre would slow down the matching slightly, although the matching still seems plenty fast enough when done inside an if clause at the moment, so I'm not sure how noticeable the removal of `else` would be. One thing could be to search for `else:` instead of just `else` in _synchre, as only the former indicates the start of a new statement (although I guess it would actually have to check for `else[\s\\]*:` or something).

Like you say though, if there are other more pressing issues with the matching then maybe it is worth fixing them all at the same time. Happy to help with that if needed.
Date User Action Args
2020-07-26 23:35:02Lewis Ballsetmessages: + msg374350
2020-07-26 04:43:53terry.reedysetmessages: + msg374301
2020-07-26 00:05:23Lewis Ballsetmessages: + msg374298
2020-07-25 18:30:34Lewis Ballsetmessages: + msg374274
2020-07-25 03:19:10terry.reedysetmessages: + msg374226
2020-07-25 03:18:15terry.reedysetnosy: + Lewis Ball
2020-07-25 03:16:04terry.reedysetmessages: + msg374225
2020-07-25 02:59:21terry.reedysetnosy: + taleinat, - Lewis Ball
versions: + Python 3.9, Python 3.10
messages: + msg374224
stage: test needed
2020-07-25 00:37:10Lewis Ballsetnosy: + Lewis Ball
messages: + msg374218
2020-07-24 19:10:41Alexey Burdincreate