Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle: updata fixwordbreaks() for unicode identifiers #65673

Closed
terryjreedy opened this issue May 12, 2014 · 10 comments
Closed

Idle: updata fixwordbreaks() for unicode identifiers #65673

terryjreedy opened this issue May 12, 2014 · 10 comments
Assignees
Labels
3.7 (EOL) end of life easy topic-IDLE type-feature A feature request or enhancement

Comments

@terryjreedy
Copy link
Member

BPO 21474
Nosy @terryjreedy, @serhiy-storchaka, @miss-islington
PRs
  • bpo-21474: Update IDLE word/identifier definition from ascii to unicode. #6643
  • [3.7] bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643) #6648
  • [3.6] bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643) #6649
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2018-04-30.07:45:57.821>
    created_at = <Date 2014-05-12.01:27:43.934>
    labels = ['easy', 'expert-IDLE', 'type-feature', '3.7']
    title = 'Idle: updata fixwordbreaks() for unicode identifiers'
    updated_at = <Date 2018-04-30.07:48:23.398>
    user = 'https://github.com/terryjreedy'

    bugs.python.org fields:

    activity = <Date 2018-04-30.07:48:23.398>
    actor = 'miss-islington'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2018-04-30.07:45:57.821>
    closer = 'terry.reedy'
    components = ['IDLE']
    creation = <Date 2014-05-12.01:27:43.934>
    creator = 'terry.reedy'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 21474
    keywords = ['easy']
    message_count = 10.0
    messages = ['218307', '218316', '218980', '218992', '315928', '315929', '315930', '315932', '315935', '315936']
    nosy_count = 3.0
    nosy_names = ['terry.reedy', 'serhiy.storchaka', 'miss-islington']
    pr_nums = ['6643', '6648', '6649']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue21474'
    versions = ['Python 3.6', 'Python 3.7']

    @terryjreedy
    Copy link
    Member Author

    EditorWindow.py has this function, applied to editor and shell windows, which is obsolete for Python3 and unicode identifiers.

    def fixwordbreaks(root):
        # Make sure that Tk's double-click and next/previous word
        # operations use our definition of a word (i.e. an identifier)
        tk = root.tk
        tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
        tk.call('set', 'tcl_wordchars', '[a-zA-Z0-9_]')
        tk.call('set', 'tcl_nonwordchars', '[^a-zA-Z0-9_]')

    Double clicking selects a contiguous sequence of 'word' or
    'nonword' characters.
    "Control-backspace deletes word left, Control-DEL deletes word right."
    "Control-left/right Arrow moves by words in a strange but useful way."

    It might be more useful if the REs were expanded.

    @terryjreedy terryjreedy added the type-feature A feature request or enhancement label May 12, 2014
    @serhiy-storchaka
    Copy link
    Member

    I think it is enough to get rid of this function.

    @terryjreedy
    Copy link
    Member Author

    Deleting the function and calls to it would be easy. But my memory, which could be off, is that I left fixwordbreaks(root) in the test function now called _editor_window (at the end of the file) because the test did not work right without it. I willhave to recheck.

    Beyond that, my first experiments were aimed at discovering the functions affected and therefor what would have to be tested with any changes.

    To properly test this requires simulating keystrokes, like control-backspace, as opposed to inserting characters. Are there any tk/tkinter tests that do this, that I could use as a model?

    @serhiy-storchaka
    Copy link
    Member

    word.tcl in Tcl library contains following lines:

    if {$::tcl_platform(platform) eq "windows"} {
        # Windows style - any but a unicode space char
        set ::tcl_wordchars {\S}
        set ::tcl_nonwordchars {\s}
    } else {
        # Motif style - any unicode word char (number, letter, or underscore)
        set ::tcl_wordchars {\w}
        set ::tcl_nonwordchars {\W}
    }

    So by default all works as expected in Motif style, but not in Windows style.

    If you want to have same behavior in both styles, defines word chars as:

        tk.call('set', 'tcl_wordchars', r'\w')
        tk.call('set', 'tcl_nonwordchars', r'\W')

    GUI tests are not needed, it is enough to test relevant Tcl commands: tcl_wordBreakAfter, tcl_wordBreakBefore, tcl_endOfWord, tcl_startOfNextWord, and tcl_startOfPreviousWord or TextSelectTo. It's interesting, there are no tests for these functions in Tcl test suite.

    @terryjreedy terryjreedy self-assigned this Jun 19, 2017
    @terryjreedy
    Copy link
    Member Author

    Now-closed duplicate bpo-33386 reported that μ, 0x3bc, is not selected as part of identifiers when double clicking. This prompted some research.

    The 'Windows' style imitates the behavior of Command Prompt, which I presume is a carryover from DOS days. PowerShell stuck with it, but Notepad, Notepad++, Microsoft Word, Firefox, Thunderbird, and ??? have not. I think Tcl should have switched long ago. In any case, I will go with whatever the tcl re engine defines as word chars, the 'Motif' style', rather than attempt to write a giant re, which would have to change as characters are added.

    Do we still need this line in fixwordbreaks?
    tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
    I will leave it until you say we don't.

    After patching, 'abcμμμdef' ('0x3bc'*3) is selected as one word instead of word, nonword, word. 'abc+efg' is still selected in 3 pieces, instead of the 1 word seen by Command Prompt.

    @terryjreedy
    Copy link
    Member Author

    New changeset 5ff3a16 by Terry Jan Reedy in branch 'master':
    bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
    5ff3a16

    @serhiy-storchaka
    Copy link
    Member

    Do we still need this line in fixwordbreaks?

    Yes. Loading word.tcl sets tcl_wordchars and tcl_nonwordchars. We should ensure that word.tcl is loaded and these variables are set before we change them.

    @miss-islington
    Copy link
    Contributor

    New changeset 887b5f8 by Miss Islington (bot) in branch '3.7':
    bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
    887b5f8

    @terryjreedy
    Copy link
    Member Author

    Thank you for the help. My immediate goal for IDLE is to fix unicode problems, to the extent allowed by tk, before 3.7.0.

    @miss-islington
    Copy link
    Contributor

    New changeset 3d11630 by Miss Islington (bot) in branch '3.6':
    bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
    3d11630

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life easy topic-IDLE type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants