classification
Title: IDLE font settings: use multiple character sets in examples
Type: enhancement Stage: resolved
Components: IDLE Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: Todd.Rovito, corona10, francismb, louielu, roger.serwy, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2012-01-16 23:22 by terry.reedy, last changed 2017-10-17 23:52 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
issue13802.patch francismb, 2013-03-23 19:49 Some more example chars review
Pull Requests
URL Status Linked Edit
PR 2616 closed louielu, 2017-07-07 07:37
PR 3960 merged terry.reedy, 2017-10-12 04:54
PR 4027 merged python-dev, 2017-10-17 22:56
Messages (19)
msg151415 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-16 23:22
In the Fonts/Tabs tab of the IDLE Preference dialog, the large box for examples of the font selected shows a small square of ascii chars. I think the box should also show 1 char for each of several alphabets so the consequence of choosing various fonts will be more evident. I am thinking of adding several line with the format
Alphabet \uXXXX <char>
msg185080 - (view) Author: Francis MB (francismb) * Date: 2013-03-23 19:49
Hi Terry,
just take/put away some ... (they're not in a special order nor preference, just some that could 'see' in the browser).
msg297868 - (view) Author: Louie Lu (louielu) * Date: 2017-07-07 08:12
Add the string with pangram and chinese, now sample text shows:

AaBbCcDdEe
FfGgHhIiJjK
1234567890
#:+=(){}[]

The quick brown fox jumps over the lazy dog. [1]

南去經三國,東來過五湖 [2]

----------------------


[1]: https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog

[2]: http://blog.justfont.com/2014/12/jfbook-example/
msg297869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-07 08:20
There are many different alphabets in the world. Why Chines but not Cyrillic or Devanagari? An example that includes all scripts would be too large (and likely most characters would not be rendered correctly with an arbitrary font).

I suggest just make the box for examples editable and save entered examples in the configuration files.
msg297870 - (view) Author: Louie Lu (louielu) * Date: 2017-07-07 08:32
Serhiy: Or would it be to detect the user language environment, and come up with user language example sentence? What I remember that Windows font preview will change the sentence in the different language.
msg297878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-07 10:17
I don't think that it is worth to include in Python distribution examples for a hundred of languages. If there are some system-wide collections of examples we can use them, but this should be platform specific and not always available.
msg297883 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-07 13:45
On other hand, we can use the standard font chooser dialog. But it doesn't allow to configure the sample text at all (at least on X Window).

Terry, try please the following commands on Windows:

import tkinter
root = tkinter.Tk()
root.tk.call('tk', 'fontchooser', 'show')

How it looks?
msg298408 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-07-15 23:14
I made this issue a dependency of #24776, which is about redoing the whole font page.

The fontchooser has good, inappropriate for IDLE, and bad points.  I consider it an alternate mockup proposal for the font page.  For this issue: displaying each font name in the font is cute, but I am not sure I want to imitate.  I address the sample selection below. 

This issue  is about making it evident that IDLE is a BMP unicode editor, not just an ASCII editor; and about showing the consequence of font choices on a particular OS and machine.  Expanding the static sample with example of the top N scripts, with N about 10, will do this.

When I looked at Francis's patch, I thought it deficient in that people would not know what chars were being replaced by boxes.  Then I tried the patch and none of them were.  At least not on Windows.  Since I opened this, we added back the Help button.  Added help text for this tab can list the scripts represented in the sample.  If nothing else, I will use this patch, as it improves of the status quo.  I should have done this years ago.  An immediate improvement would be all chars from a script on one line and some (more?) hanji/kanji CJK chars.

#24776 suggests putting the sample beside the font selection box.  The font box only needs 75% of the width it has.  With the frame around the sample label removed, there may be more width available.  There will certainly be more lines.

Louie, I understand your PR to be a suggestion about pangrams, expressed in code, to be 'pulled' into my mind, and possibly into my clone, and not a request to merge as is with only Chinese.  Others have noted that submitting work-in-progess patches as PRs, rather than as diffs to the tracker, can be confusing.  Yet it makes review easy.

In considering the idea, I looked at https://en.wikipedia.org/wiki/Pangram and also found http://clagnut.com/blog/2380/, which lists pangrams in multiple languages that were once on wiki/List_of_Pangrams.

I am rejecting the idea as is for multiple related reasons.  The sample is about scripts, not languages.  Long phrases mean fewer scripts.  Many scripts are used for multiple dialects or even languages.  Which one to choose?  An innocent or poetic phrase in one language may be less innocent in another dialect/language, or if interpreted metaphorically, or one consider possible alternate meanings of words.

One thing I would consider is script names written in the script.  This would replace an arbitrary sample from the same script and could be done on a script-by-script basis.  I believe 'devanagari' in Devanagari should be intelligible in all or most north Indian languages that use Devanagari or derivatives thereof, and offensive in none.  But I would enquire.

What do others think of this?

Louie, I presume 'hanji' is a single char.  How about using the stroke pangram 永 (all basic strokes) in a phrase such as 'hanji character 永'?  (I like the look of the char as well).

Serhiy, is there one non-controversial pan-slavic way to write 'Cyrillic' in Cyrillic?  Or controversy-engendering national variations?

A separate issue could add a button and dropdown multiple selection list to print to an output window all or a selection of the 256 blocks of 256 codepoints in the BMP. This should eliminate any need for a configuration option.

> likely most characters would not be rendered correctly with an arbitrary font

A goal of this issue is to let people see such problems where they exist.

The fontchooser sample awful.  Only couple of ascii and script chars and a small sample of script that changes with each script.  It has no knowledge of default characters.
msg298413 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-16 04:27
> Serhiy, is there one non-controversial pan-slavic way to write 'Cyrillic' in Cyrillic?

No. Even in very close east-slavic languages it is written differently: Ukrainian "Кирилиця", Russian "Кириллица", Belarusian "Кірыліца".
msg298762 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-07-21 02:36
I decided to rearrange page #24776 before changing sample #13802.
msg304205 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-10-12 05:07
I decided to do the needed rearrangement of frames in this issue, leaving revision of widgets other than the sample to #24776.

PR3960 initially puts the following in the sample box.  I believe it should cover most Python users.

<ASCII/Latin1>
AaBbCcDdEeFfGgHhIiJj
1234567890#:+=(){}[]
¡¢£¥§©«®¶½ÀÁÂÃÄÅÇÐØß

<IPA,Greek,Cyrillic>
ɐɕɘɞɟɤɫɮɰɷɻʁʃʆʎʞʢʫʭʯ
ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκ
БбДдЖжПпФфЧчЪъЭэѠѤѬӜ

<Hebrew, Arabic>
אבגדהוזחטיךכלםמןנסעף
ابجدهوزحطي٠١٢٣٤٥٦٧٨٩

<Devanagari>
०१२३४५६७८९अआइईउऊएऐओऔ
कगङचजञतदनपबमयरलवशसह॥

<East Asian>
〇一二三四五六七八九
汉字漢字人木火土金水
ᅡᅦᅩᆨᆫ가결걵곴극
あいうえおかさたなま

The new help message explains

Font sample: This shows what a selection of BMP unicode characters look
like for the current font selection.  If a font face does not define a
character, Tk attempts to find another font that does.  Substitute
glyphs will not necessarily have the same size as the font selected.
Hebrew and Arabic letters should display right to left, starting with
alef, \u05d0 and \u0627.  Arabic numerals display left to right.  The
East Asian samples are Chinese digits followed by Chinese Hanzi, Korean
Hangul, and Japanese Hiragana.

Except for Chinese, I intend for the samples to be meaningless subsequences of the respective character set.  I will try to get comments from others for the Indian, Korean, and Japanese samples.

The Chinese reads 'hanzi hanzi person wood fire earth metal water' with 'hanzi' repeated in simplified and traditional character variations.  Louie, could this be controversial in any way?
msg304210 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 07:27
I think empty lines and headers are not needed. They just increase the size of the dialog window, it may be too large for low resolution screens.

The standard font chooser on Windows uses the following sets for Unicode-aware fonts: Western, Cyrillic, Hebrew, Arabic, Greek, Turkish, Baltic (doesn't differ from Western), Central-European, Vietnamese. I think it is worth to decrease the Devanagari sample and add few Central-European (ÁáÔô), Turkish (ĞğŞş) and Vietnamese (ƠơƯư) characters. Maybe decrease the size of all non-ASCII samples? 2-3 letters (in upper and lower cases) should be enough.
msg304212 - (view) Author: Louie Lu (louielu) * Date: 2017-10-12 07:44
PR 3960 display on my Linux works well, and the combination of the Chinese characters has no political controversial.

But it is lack of simplify characters, the only one is '汉', others are all traditional characters (or they are same in simp. and trad.).

For Japanese, there are katakana (カタカナ), hiragana(ひらがな) and kanji (漢字), the characters you provide only contain hiragana. Maybe a sentence that contains all kinds of characters will be better.

For Korean, the characters are compound to one, but I'm not sure will they use the first four characters to test the font. CC for corona10.

For Chinese, Japanese, Korean, I'll still prefer and recommend to use a sentence to give user the perspective of the font.


For other character set, maybe we can reference from Google fonts:
https://fonts.google.com/
they got a list of Unicode-aware language:

    Arbic, Bengali, Cyrillic, Cyrillic Extended, Devanagari, Greek,
    Greek Extended, Gujarati, Gurmukhi, Hebrew, Kannada, Khmer,
    Latin, Latin Extended, Malayalam, Myanmar, Oriya, Tamil, Telugu,
    Thai, Vietnamese.
msg304219 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 08:12
I suggest to make the sample text editable:

    self.font_sample = Text(frame_sample, font=temp_font, width=20)
    self.font_sample.insert(END, sample)

This will allow a user to test fonts in any perspective.
msg304222 - (view) Author: Dong-hee Na (corona10) * Date: 2017-10-12 09:26
Louie Lu 

Thanks for the cc.
In Korea, 'ᅡᅦᅩᆨᆫ가결걵곴극' are not recommended characters sequence.
This sequence is not fully cover Korean characters.

FYI, 
MS Windows default font preview sequence is 
'다람쥐 헌 쳇바퀴에 타고파'

Google chrome preview sequence is
'정 참판 양반댁 규수 큰 교자 타고 혼례 치른 날'

Those two sequences are the Korean version of 'The quick brown fox jumps over the lazy dog'.

If you want to use Korean version of 'AaBbCcDdEeFfGgHhIiJj'
I recommend '가나다라마바사아자차카파타하'
msg304223 - (view) Author: Dong-hee Na (corona10) * Date: 2017-10-12 09:40
'가냐더려모뵤수유즈치캐턔페혜' is better than '가나다라마바사아차카파타하'
msg304293 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-10-12 22:08
I pushed new commits to the PR that changed the Korean, Japanese, and Indian samples and the help message.

Korea: The line is now the first 10 chars of Dong-hee Na's suggestion.  Thank you for helping (and Louie for the cc).

Japan: The line is now the hiragana and katakana versions of the 5 vowels.  As far as I know, 'kanji' are a subset of CJK 'hanzi', with different pronunciations and possibly different meanings, so I see no need for any here.

India: The second line is now Tamil digits and vowels.

China: I found the chars, with translation, including the one simplified/traditional pair, on Wikipedia.  I did not know if more pairs would be a good idea or not, and I do not know of others.  Louie, if you have a better idea, please post it (with translation ;-).

Dialog size: The height is about 710 pixels, 10% larger than before the additions to the General page a month ago.  So it was previously too tall for 800 x 600 and still fits 1024 x 768.  Further expansion should mostly be in width, but there is some vertical room.

Spaces and labels: I initially had neither.  I could hardly stand to look.  The label are needed for people who do not recognize the character sets, and I use them to make three general points in the help entry: tk uses whatever font it can to cover most of the BMP; fixed font sizes for Latin font only apply to Latin chars; and right-to-left is handled correctly.

Latin1: I consider the Windows fontchooser to be an anti-model for this issue in that it is limited to alphabetic characters and confuses 'language' with 'character set'.  Western, Central European, Baltic, Turkish, and Vietnamese languages all use latin characters.  The non-ascii, decorated versions are 'covered' by the non-ascii Latin1 line.  Á is already present. Â is intended to represent all circumflexed characters, including Ô. I intended the repeated use of A as base character to imply that.  I not sure if using a different base for each decoration would be better.  Similarly, Ç covers Ş.  I am thinking about using fewer symbols and more alphabetic characters.  It might be a good idea to add something, such as Ğ, from the Extended block, beyond \u00ff.

User additions: This would be a separate issue, see #31777.
msg304543 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-10-17 22:56
New changeset e2e42274ee5db1acedf57b63943e1f536d7a25bc by Terry Jan Reedy in branch 'master':
bpo-13802: Use non-Latin characters in IDLE's Font settings sample. (#3960)
https://github.com/python/cpython/commit/e2e42274ee5db1acedf57b63943e1f536d7a25bc
msg304545 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-10-17 23:51
New changeset ecacbb4f22ae86d29a73a5f715bce07d091da10d by Terry Jan Reedy (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-13802: Use non-Latin characters in IDLE's Font settings sample. (GH-3960) (#4027)
https://github.com/python/cpython/commit/ecacbb4f22ae86d29a73a5f715bce07d091da10d
History
Date User Action Args
2017-10-17 23:52:32terry.reedysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-17 23:51:52terry.reedysetmessages: + msg304545
2017-10-17 22:56:27python-devsetpull_requests: + pull_request4002
2017-10-17 22:56:20terry.reedysetmessages: + msg304543
2017-10-12 22:08:19terry.reedysetmessages: + msg304293
2017-10-12 09:40:07corona10setmessages: + msg304223
2017-10-12 09:26:06corona10setmessages: + msg304222
2017-10-12 08:12:58serhiy.storchakasetmessages: + msg304219
2017-10-12 07:44:23louielusetnosy: + corona10
messages: + msg304212
2017-10-12 07:27:26serhiy.storchakasetmessages: + msg304210
2017-10-12 05:07:54terry.reedysetdependencies: - IDLE: Improve config dialog font change user interface
messages: + msg304205
title: IDLE Prefernces/Fonts: use multiple alphabets in examples -> IDLE font settings: use multiple character sets in examples
2017-10-12 04:54:17terry.reedysetstage: needs patch -> patch review
pull_requests: + pull_request3937
2017-07-21 02:37:00terry.reedyunlinkissue24776 dependencies
2017-07-21 02:36:45terry.reedysetdependencies: + IDLE: Improve config dialog font change user interface
messages: + msg298762
2017-07-16 04:27:47serhiy.storchakasetmessages: + msg298413
2017-07-15 23:14:55terry.reedysetmessages: + msg298408
stage: patch review -> needs patch
2017-07-10 22:05:15terry.reedylinkissue24776 dependencies
2017-07-07 13:45:16serhiy.storchakasetmessages: + msg297883
2017-07-07 10:17:42serhiy.storchakasetmessages: + msg297878
2017-07-07 08:32:09louielusetmessages: + msg297870
2017-07-07 08:20:28serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg297869
2017-07-07 08:12:17louielusetnosy: + louielu
messages: + msg297868
2017-07-07 07:37:15louielusetpull_requests: + pull_request2682
2017-06-30 01:30:39terry.reedysetassignee: terry.reedy
stage: needs patch -> patch review
versions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.3, Python 3.4
2013-06-15 18:50:45terry.reedysetversions: + Python 3.4, - Python 3.2
2013-03-25 15:56:40Todd.Rovitosetnosy: + roger.serwy
2013-03-24 05:26:26Todd.Rovitosetnosy: + Todd.Rovito
2013-03-23 19:49:32francismbsetfiles: + issue13802.patch

nosy: + francismb
messages: + msg185080

keywords: + patch
2012-01-16 23:22:19terry.reedycreate