classification
Title: IDLE: highlight soft keywords
Type: behavior Stage: patch review
Components: IDLE Versions: Python 3.11, Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: epaine, kj, taleinat, terry.reedy
Priority: normal Keywords: patch

Created on 2021-05-02 15:18 by epaine, last changed 2021-05-03 13:33 by taleinat.

Pull Requests
URL Status Linked Edit
PR 25851 open taleinat, 2021-05-03 13:32
Messages (11)
msg392705 - (view) Author: E. Paine (epaine) * Date: 2021-05-02 15:18
As-per PEP 634, structural pattern matching is now in Python. This introduces the `match` and `case` keywords. IDLE does not highlight these.

The problem is that these are listed in `keyword.softkwlist` rather than `keyword.kwlist` (which is what IDLE uses). This confuses me, as this is not a __future__ feature and there is no discussion of it becoming one in #42128. There is also no discussion (that I could find) about which list it should be put in. The addition to softkwlist was done in PR-22917.

Do we change IDLE to use softkwlist, or move those keywords into kwlist?
msg392707 - (view) Author: Ken Jin (kj) * (Python triager) Date: 2021-05-02 15:53
Hi, I'm no IDLE expert, but I think moving the new soft keywords into kwlist seems wrong:

Soft keywords were added in Python 3.9 when the PEG parser became the default. The keyword list was also updated accordingly https://docs.python.org/3/library/keyword.html#keyword.softkwlist.

This link provides an explanation of how soft keywords differ from normal keywords:  https://docs.python.org/3.10/reference/lexical_analysis.html#soft-keywords

Thanks
msg392708 - (view) Author: E. Paine (epaine) * Date: 2021-05-02 16:08
Thanks for linking to the Lexical Analysis docs. Not quite sure how I missed this given it is directly below the normal keywords section. Given the distinction described there, it may instead be best for IDLE to highlight this as its own category (i.e. not grouping it with the standard keywords).
msg392713 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-05-02 18:05
Soft keywords are a huge nuisance for syntax highlighting as they need special case regexes and tests.

Hard keywords are matched against complete words, regardless of whether the context is syntactically valid or not.  If 'for' and 'else' were the only keywords, the keyword part of the IDLE colorizer regex would be as follows.

>>> kw = r"\b" + colorizer.any("KEYWORD", ['for', 'else']) + r"\b"
>>> kw
'\\b(?P<KEYWORD>for|else)\\b'

Both words in 'for.else' are highlighted as the tokenizer will see them as keywords.  The parser will later see the combination as an error.

The tag name in a "(?P<name>...) construct can only be used once in a regex.  Since the word-boundary context is the same for all hard keywords, the alternation can be done within one such context and all (hard) keywords get the same match tag (dict key "KEYWORD"), making it easy to give all the same highlight.

Soft keywords need different contexts to avoid false positives.  'match' and 'case' must be the first non-blank on a line and followed by ':'.  '_' must follow 'case' and space. I believe each context will have to have its own tag name, so multiple keyword tags must be mapped to 'keyword'.  

skw1 = r"^[ \t]*(?P<SKEY1>match|case)[ \t]+:"
skw2 = r"case[ \t]+(?P<SKEY1>_)\b"

Add skw1 and skw2 to the prog definition, which should use "|".join(...).

In ColorDelegator.LoadTagDefs (which should be renamed), replace

            "KEYWORD": idleConf.GetHighlight(theme, "keyword"),

with
            "KEYWORD": keydef
            "SKEY1": keydef
            "SKEY2": keydef

after first defining keydef with

        keydef = idleConf.GetHighlight(theme, "keyword")

Some new tests will be needed.
msg392788 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-03 10:14
Terry, Elisha, does one of you intend to work on this? If not, I'd be willing to.
msg392791 - (view) Author: E. Paine (epaine) * Date: 2021-05-03 10:25
I don't mind, would you like to Tal? (I probably won't be able to dedicate any serious time to it until mid-June). One thing I've been thinking is whether it's worth us highlighting regardless of context. For example, you can assign a variable to a builtin name (not that it's recommended) so we could just give soft keywords their own colour and (unofficially) recommend people don't use such words for variables.

I think this would be more future-proof as we wouldn't need to update the regexes for each new soft keyword added. However, we might not want to highlight every time the user has an '_' variable (as is fairly common).
msg392792 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-03 10:41
I think it is rather crucial to have this with the 3.10 release. I'll try to get this working ASAP.

I agree that a simple "good enough" solution could be a good start, but "_" will likely need special handling.
msg392793 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-05-03 10:42
My plan for the next day or two is to submit followup issue for Shell and formally code what I wrote.

The only way to handle soft keywords correctly is with a custom re.  I don't expect them to become common.  They are different from builtins because they only have special meaning in (so far) definable situations.  When  builtin is 'redefined, it may or may not be appropriate to keep the highlight.  Examples when it is:

oldprint = print
def print(*args, **kwds:
    log the print
    oldprint(*args, **kwds)

def intsum(nums, int=int):  # Localize int for speed.
    <code that calls int multiple times>
msg392794 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-05-03 10:49
I agree with getting this in soon.

A related request is to to syntax highlight field expressions in f strings.  I don't think there is an existing issue.  Apparently, at least some alternatives to IDLE do this.  I am not sure I would really want it, but we need at least some mockups.  Tal, what do you think and are you interested in trying to write a PR?
msg392796 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-03 10:51
> A related request is to to syntax highlight field expressions in f strings.

Related, but separate, and IMO not quite as urgent.

I can commit to working on this issue (soft keywords), but I'll have to see where things stand once this is finished before moving on to f-strings.
msg392807 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2021-05-03 13:33
I've created a PR (GH-25851) with a rather quick, working implementation.

This includes some tests but I haven't thoroughly tested it yet.

If anyone can take a look and give feedback on the approach, that would be great.
History
Date User Action Args
2021-05-03 13:33:25taleinatsetmessages: + msg392807
2021-05-03 13:32:00taleinatsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request24534
2021-05-03 10:51:53taleinatsetmessages: + msg392796
2021-05-03 10:49:55terry.reedysetmessages: + msg392794
2021-05-03 10:42:41terry.reedysetmessages: + msg392793
2021-05-03 10:41:08taleinatsetmessages: + msg392792
2021-05-03 10:25:14epainesetmessages: + msg392791
2021-05-03 10:14:15taleinatsetmessages: + msg392788
2021-05-02 18:05:45terry.reedysetmessages: + msg392713
stage: test needed
2021-05-02 16:08:49epainesetmessages: + msg392708
title: IDLE: highlight new `match` / `case` syntax -> IDLE: highlight soft keywords
2021-05-02 15:53:38kjsetnosy: + kj
messages: + msg392707
2021-05-02 15:18:46epainecreate