classification
Title: Idle: updata fixwordbreaks() for unicode identifiers
Type: enhancement Stage: resolved
Components: IDLE Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: miss-islington, serhiy.storchaka, terry.reedy
Priority: normal Keywords: easy

Created on 2014-05-12 01:27 by terry.reedy, last changed 2018-04-30 07:48 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6643 merged terry.reedy, 2018-04-29 20:17
PR 6648 merged miss-islington, 2018-04-30 07:08
PR 6649 merged miss-islington, 2018-04-30 07:09
Messages (10)
msg218307 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-05-12 01:27
EditorWindow.py has this function, applied to editor and shell windows, which is obsolete for Python3 and unicode identifiers.

def fixwordbreaks(root):
    # Make sure that Tk's double-click and next/previous word
    # operations use our definition of a word (i.e. an identifier)
    tk = root.tk
    tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
    tk.call('set', 'tcl_wordchars', '[a-zA-Z0-9_]')
    tk.call('set', 'tcl_nonwordchars', '[^a-zA-Z0-9_]')

Double clicking selects a contiguous sequence of 'word' or 
'nonword' characters.
"Control-backspace deletes word left, Control-DEL deletes word right."
"Control-left/right Arrow moves by words in a strange but useful way."

It might be more useful if the REs were expanded.
msg218316 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-12 07:03
I think it is enough to get rid of this function.
msg218980 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-05-23 17:35
Deleting the function and calls to it would be easy. But my memory, which could be off, is that I left fixwordbreaks(root) in the test function now called _editor_window (at the end of the file) because the test did not work right without it. I willhave to recheck.

Beyond that, my first experiments were aimed at discovering the functions affected and therefor what would have to be tested with any changes.

To properly test this requires simulating keystrokes, like control-backspace, as opposed to inserting characters. Are there any tk/tkinter tests that do this, that I could use as a model?
msg218992 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-23 19:33
word.tcl in Tcl library contains following lines:

if {$::tcl_platform(platform) eq "windows"} {
    # Windows style - any but a unicode space char
    set ::tcl_wordchars {\S}
    set ::tcl_nonwordchars {\s}
} else {
    # Motif style - any unicode word char (number, letter, or underscore)
    set ::tcl_wordchars {\w}
    set ::tcl_nonwordchars {\W}
}

So by default all works as expected in Motif style, but not in Windows style.

If you want to have same behavior in both styles, defines word chars as:

    tk.call('set', 'tcl_wordchars', r'\w')
    tk.call('set', 'tcl_nonwordchars', r'\W')

GUI tests are not needed, it is enough to test relevant Tcl commands: tcl_wordBreakAfter, tcl_wordBreakBefore, tcl_endOfWord, tcl_startOfNextWord, and tcl_startOfPreviousWord or TextSelectTo. It's interesting, there are no tests for these functions in Tcl test suite.
msg315928 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-04-30 07:06
Now-closed duplicate #33386 reported that μ, 0x3bc, is not selected as part of identifiers when double clicking.  This prompted some research.

The 'Windows' style imitates the behavior of Command Prompt, which I presume is a carryover from DOS days.  PowerShell stuck with it, but Notepad, Notepad++, Microsoft Word, Firefox, Thunderbird, and ??? have not. I think Tcl should have switched long ago.  In any case, I will go with whatever the tcl re engine defines as word chars, the 'Motif' style', rather than attempt to write a giant re, which would have to change as characters are added.

Do we still need this line in fixwordbreaks?
    tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
I will leave it until you say we don't.

After patching, 'abcμμμdef' ('0x3bc'*3) is selected as one word instead of word, nonword, word.  'abc+efg' is still selected in 3 pieces, instead of the 1 word seen by Command Prompt.
msg315929 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-04-30 07:08
New changeset 5ff3a161c8a6b525c5e5b3e36e9c43f5a95bda60 by Terry Jan Reedy in branch 'master':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
https://github.com/python/cpython/commit/5ff3a161c8a6b525c5e5b3e36e9c43f5a95bda60
msg315930 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-30 07:15
> Do we still need this line in fixwordbreaks?

Yes. Loading word.tcl sets tcl_wordchars and tcl_nonwordchars. We should ensure that word.tcl is loaded and these variables are set before we change them.
msg315932 - (view) Author: miss-islington (miss-islington) Date: 2018-04-30 07:27
New changeset 887b5f8fc622267e1fd48862ea9d0dfd4a0abdc6 by Miss Islington (bot) in branch '3.7':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
https://github.com/python/cpython/commit/887b5f8fc622267e1fd48862ea9d0dfd4a0abdc6
msg315935 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-04-30 07:45
Thank you for the help.  My immediate goal for IDLE is to fix unicode problems, to the extent allowed by tk, before 3.7.0.
msg315936 - (view) Author: miss-islington (miss-islington) Date: 2018-04-30 07:48
New changeset 3d11630ff401cfcdf094cf039cb575332ecaea20 by Miss Islington (bot) in branch '3.6':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
https://github.com/python/cpython/commit/3d11630ff401cfcdf094cf039cb575332ecaea20
History
Date User Action Args
2018-04-30 07:48:23miss-islingtonsetnosy: + miss-islington
messages: + msg315936
2018-04-30 07:45:57terry.reedysetstatus: open -> closed


keywords: - patch
nosy: - miss-islington
messages: + msg315935
resolution: fixed
stage: patch review -> resolved
2018-04-30 07:27:53miss-islingtonsetnosy: + miss-islington
messages: + msg315932
2018-04-30 07:15:46serhiy.storchakasetmessages: + msg315930
2018-04-30 07:09:13miss-islingtonsetpull_requests: + pull_request6345
2018-04-30 07:08:33miss-islingtonsetkeywords: + patch
stage: commit review -> patch review
pull_requests: + pull_request6344
2018-04-30 07:08:04terry.reedysetmessages: + msg315929
2018-04-30 07:06:37terry.reedysetkeywords: - patch

messages: + msg315928
stage: patch review -> commit review
2018-04-29 20:17:32terry.reedysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request6341
2018-04-29 18:54:45terry.reedylinkissue33386 superseder
2017-06-19 23:36:44terry.reedysetassignee: terry.reedy
components: + IDLE
versions: + Python 3.6, Python 3.7, - Python 3.4, Python 3.5
2014-05-23 19:33:22serhiy.storchakasetmessages: + msg218992
2014-05-23 17:35:51terry.reedysetmessages: + msg218980
2014-05-23 15:47:08serhiy.storchakasetkeywords: + easy
stage: test needed -> needs patch
2014-05-12 07:03:04serhiy.storchakasetmessages: + msg218316
2014-05-12 01:27:43terry.reedycreate