Idle: updata fixwordbreaks() for unicode identifiers #65673

terryjreedy · 2014-05-12T01:27:44Z

BPO	21474
Nosy	@terryjreedy, @serhiy-storchaka, @miss-islington
PRs	bpo-21474: Update IDLE word/identifier definition from ascii to unicode. #6643 [3.7] bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643) #6648 [3.6] bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643) #6649

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/terryjreedy'
closed_at = <Date 2018-04-30.07:45:57.821>
created_at = <Date 2014-05-12.01:27:43.934>
labels = ['easy', 'expert-IDLE', 'type-feature', '3.7']
title = 'Idle: updata fixwordbreaks() for unicode identifiers'
updated_at = <Date 2018-04-30.07:48:23.398>
user = 'https://github.com/terryjreedy'

bugs.python.org fields:

activity = <Date 2018-04-30.07:48:23.398>
actor = 'miss-islington'
assignee = 'terry.reedy'
closed = True
closed_date = <Date 2018-04-30.07:45:57.821>
closer = 'terry.reedy'
components = ['IDLE']
creation = <Date 2014-05-12.01:27:43.934>
creator = 'terry.reedy'
dependencies = []
files = []
hgrepos = []
issue_num = 21474
keywords = ['easy']
message_count = 10.0
messages = ['218307', '218316', '218980', '218992', '315928', '315929', '315930', '315932', '315935', '315936']
nosy_count = 3.0
nosy_names = ['terry.reedy', 'serhiy.storchaka', 'miss-islington']
pr_nums = ['6643', '6648', '6649']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue21474'
versions = ['Python 3.6', 'Python 3.7']

terryjreedy · 2014-05-12T01:27:43Z

EditorWindow.py has this function, applied to editor and shell windows, which is obsolete for Python3 and unicode identifiers.

def fixwordbreaks(root):
    # Make sure that Tk's double-click and next/previous word
    # operations use our definition of a word (i.e. an identifier)
    tk = root.tk
    tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
    tk.call('set', 'tcl_wordchars', '[a-zA-Z0-9_]')
    tk.call('set', 'tcl_nonwordchars', '[^a-zA-Z0-9_]')

Double clicking selects a contiguous sequence of 'word' or
'nonword' characters.
"Control-backspace deletes word left, Control-DEL deletes word right."
"Control-left/right Arrow moves by words in a strange but useful way."

It might be more useful if the REs were expanded.

serhiy-storchaka · 2014-05-12T07:03:05Z

I think it is enough to get rid of this function.

terryjreedy · 2014-05-23T17:35:49Z

Deleting the function and calls to it would be easy. But my memory, which could be off, is that I left fixwordbreaks(root) in the test function now called _editor_window (at the end of the file) because the test did not work right without it. I willhave to recheck.

Beyond that, my first experiments were aimed at discovering the functions affected and therefor what would have to be tested with any changes.

To properly test this requires simulating keystrokes, like control-backspace, as opposed to inserting characters. Are there any tk/tkinter tests that do this, that I could use as a model?

serhiy-storchaka · 2014-05-23T19:33:22Z

word.tcl in Tcl library contains following lines:

if {$::tcl_platform(platform) eq "windows"} {
    # Windows style - any but a unicode space char
    set ::tcl_wordchars {\S}
    set ::tcl_nonwordchars {\s}
} else {
    # Motif style - any unicode word char (number, letter, or underscore)
    set ::tcl_wordchars {\w}
    set ::tcl_nonwordchars {\W}
}

So by default all works as expected in Motif style, but not in Windows style.

If you want to have same behavior in both styles, defines word chars as:

    tk.call('set', 'tcl_wordchars', r'\w')
    tk.call('set', 'tcl_nonwordchars', r'\W')

GUI tests are not needed, it is enough to test relevant Tcl commands: tcl_wordBreakAfter, tcl_wordBreakBefore, tcl_endOfWord, tcl_startOfNextWord, and tcl_startOfPreviousWord or TextSelectTo. It's interesting, there are no tests for these functions in Tcl test suite.

terryjreedy · 2018-04-30T07:06:37Z

Now-closed duplicate bpo-33386 reported that μ, 0x3bc, is not selected as part of identifiers when double clicking. This prompted some research.

The 'Windows' style imitates the behavior of Command Prompt, which I presume is a carryover from DOS days. PowerShell stuck with it, but Notepad, Notepad++, Microsoft Word, Firefox, Thunderbird, and ??? have not. I think Tcl should have switched long ago. In any case, I will go with whatever the tcl re engine defines as word chars, the 'Motif' style', rather than attempt to write a giant re, which would have to change as characters are added.

Do we still need this line in fixwordbreaks?
tk.call('tcl_wordBreakAfter', 'a b', 0) # make sure word.tcl is loaded
I will leave it until you say we don't.

After patching, 'abcμμμdef' ('0x3bc'*3) is selected as one word instead of word, nonword, word. 'abc+efg' is still selected in 3 pieces, instead of the 1 word seen by Command Prompt.

terryjreedy · 2018-04-30T07:08:04Z

New changeset 5ff3a16 by Terry Jan Reedy in branch 'master':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
5ff3a16

serhiy-storchaka · 2018-04-30T07:15:47Z

Do we still need this line in fixwordbreaks?

Yes. Loading word.tcl sets tcl_wordchars and tcl_nonwordchars. We should ensure that word.tcl is loaded and these variables are set before we change them.

miss-islington · 2018-04-30T07:27:53Z

New changeset 887b5f8 by Miss Islington (bot) in branch '3.7':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
887b5f8

terryjreedy · 2018-04-30T07:45:58Z

Thank you for the help. My immediate goal for IDLE is to fix unicode problems, to the extent allowed by tk, before 3.7.0.

miss-islington · 2018-04-30T07:48:23Z

New changeset 3d11630 by Miss Islington (bot) in branch '3.6':
bpo-21474: Update IDLE word/identifier definition from ascii to unicode. (GH-6643)
3d11630

terryjreedy added the type-feature A feature request or enhancement label May 12, 2014

serhiy-storchaka added the easy label May 23, 2014

terryjreedy added topic-IDLE 3.7 (EOL) end of life labels Jun 19, 2017

terryjreedy self-assigned this Jun 19, 2017

terryjreedy closed this as completed Apr 30, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle: updata fixwordbreaks() for unicode identifiers #65673

Idle: updata fixwordbreaks() for unicode identifiers #65673

terryjreedy commented May 12, 2014

terryjreedy commented May 12, 2014

serhiy-storchaka commented May 12, 2014

terryjreedy commented May 23, 2014

serhiy-storchaka commented May 23, 2014

terryjreedy commented Apr 30, 2018

terryjreedy commented Apr 30, 2018

serhiy-storchaka commented Apr 30, 2018

miss-islington commented Apr 30, 2018

terryjreedy commented Apr 30, 2018

miss-islington commented Apr 30, 2018

Idle: updata fixwordbreaks() for unicode identifiers #65673

Idle: updata fixwordbreaks() for unicode identifiers #65673

Comments

terryjreedy commented May 12, 2014

terryjreedy commented May 12, 2014

serhiy-storchaka commented May 12, 2014

terryjreedy commented May 23, 2014

serhiy-storchaka commented May 23, 2014

terryjreedy commented Apr 30, 2018

terryjreedy commented Apr 30, 2018

serhiy-storchaka commented Apr 30, 2018

miss-islington commented Apr 30, 2018

terryjreedy commented Apr 30, 2018

miss-islington commented Apr 30, 2018