➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tchrist
Recipients Arfrever, ezio.melotti, lemburg, loewis, mrabarnett, tchrist, terry.reedy, vstinner
Date 2011-08-15.18:40:51
SpamBayes Score 9.7644115e-14
Marked as misclassified No
Message-id <12730.1313433631@chthon>
In-reply-to <1313431327.76.0.937601511804.issue12730@psf.upfronthosting.co.za>
Content
>Terry J. Reedy <tjreedy@udel.edu> added the comment:

> You are right, FF switched on me without notice. Bad FF. Thank you! What
> I now see makes much more sense.

>    [ "𐐼𐐯𐑅𐐨𐑉𐐯𐐻", "𐐼𐐯𐑅𐐨𐑉𐐯𐐻", "𐐔𐐯𐑅𐐨𐑉𐐯𐐻", "𐐔𐐇𐐝𐐀𐐑𐐇𐐓"  ],

> and I now know to check on other pages (although Tom's Unicode talk
> slides still have boxes even in utf-8, so that must be a font lack).

Do you have Symbola installed?  Here's Appendix I on Fonts for things that
should look right for the presentation to look right.  

    * I recommend two free fonts from George Douros at users.teilar.gr/~g1951d/ known to
      work with this presentation: his Alfios font for regular text, and his Symbola font
      for fancy emoji. If any of these don’t look right to you, you probably need to
      supplement your system fonts:

            Ligatures: fi ffi ff ffl fl Ξ² ẞ ο¬… st
            Math letters: π’œ π’Ÿ 𝔅 π”Ž 𝔼 𝔽
            Gothic & Deseret: πŒΈπŒΌπŒ½π‚, 𐐔𐐯𐑅𐐨𐑉𐐯𐐻
            Symbols: βœ” βœ… πŸͺ πŸ“– πŸ›‚ 🐍
            Emoticons: πŸ˜‡ 😈 πŸ˜‰ 😨 😭 😱
            Upside‐down: Β‘pɐəΙ₯ ΙΉnoʎ uo Ζƒuᴉpuɐʇs ʎq sᴉΙ₯Κ‡ pΙΙ™α΄š
            Combining characters: β—ŒΜ‚,β—ŒΜƒ,β—Œβƒž,β—ŒΜ²,β—ŒοΈ€,β—ŒΜ΅,β—ŒΜ·

    * The last line with combining characters is especially hard to get to look right. 
      You may find that the shareware font Everson Mono works when all else fails.

You do need Unicode 5.1 support for the LATIN CAPITAL LETTER SHARP S, and
you need Unicode 6.0 support for most of the emoji (I think Snow Leopard
has colorized versions of these.  The Ligature line above looks good in Alfios.

It  turns out it may not always the font used with combining chars as it is whether and
well your browser supports true combining characters dynamically generated, or whether it
runs stuff through NFC and looks for substitution glyphs.  I am not a GUI person, so am
mostly just guessing.

But this I find interesting:  If you look at slide 33 of my first talk or slide 5 of my
second talk, which are duplicates entitled Canonical Conundra, the second column which is
labelled Glyphs explicitly uses Time New Roman because of this issue.  Even so you can
tell it is doing the NFC trick, because lines 1+2 have the same NFC of \x{F5} or Γ΅, as do
3+4+5 with \x{22D} with Θ­, and and 6+7 with ō̃.

The glyphs from the first group are both identical, and so are all three those of the
second group, as both the first two groups have a single precomposed character available
for their NFC.  In contrast, there is no single precomposed glyph available for 6+7, and
you can tell that it's stacking it on the fly using slightly less tight grouping rules
than the font has in the precomposed versions above it.

I use Safari, but I am told Firefox looks ok, too.  Opera is my normal browser but it
does the copout I just described on combining chars without ever being able to
dynamically stack them if the copout fail, so I can't use it for this presentation.

--tom

  $ uniprops -a 'LATIN CAPITAL LETTER SHARP S' 'DESERET CAPITAL LETTER DEE' 'GOTHIC LETTER MANNA' 'SNAKE' 'FACE SCREAMING IN FEAR'

    U+1E9E <ẞ> \N{LATIN CAPITAL LETTER SHARP S}
        \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}
        All Any Alnum Alpha Alphabetic Assigned InLatinExtendedAdditional Cased Cased_Letter LC Changes_When_Casefolded CWCF
           Changes_When_Casemapped CWCM Changes_When_Lowercased CWL Changes_When_NFKC_Casefolded CWKCF Lu L Gr_Base Grapheme_Base
           Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Latin Latn Latin_Extended_Additional Uppercase_Letter Print Upper
           Uppercase Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Upper
           X_POSIX_Word
        Age=5.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Latin_Extended_Additional Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None
           East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA
           Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining
           JT=U Joining_Type=U Script=Latin Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None Numeric_Value=NaN
           NV=NaN Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Latn Script=Latn Sentence_Break=UP
           Sentence_Break=Upper SB=UP Word_Break=ALetter WB=LE Word_Break=LE _X_Begin

    U+10414 <𐐔> \N{DESERET CAPITAL LETTER DEE}
        \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}
        All Any Alnum Alpha Alphabetic Assigned InDeseret Cased Cased_Letter LC Changes_When_Casefolded CWCF
           Changes_When_Casemapped CWCM Changes_When_Lowercased CWL Changes_When_NFKC_Casefolded CWKCF Deseret Dsrt Lu L Gr_Base
           Grapheme_Base Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Uppercase_Letter Print Upper Uppercase Word
           XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Upper X_POSIX_Word
        Age=3.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Deseret Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None
           Script=Deseret East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX
           Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup
           Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1
           Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Dsrt Script=Dsrt
           Sentence_Break=UP Sentence_Break=Upper SB=UP Word_Break=ALetter WB=LE Word_Break=LE _X_Begin

    U+1033C <𐌼> \N{GOTHIC LETTER MANNA}
        \w \pL \p{L_} \p{Lo}
        All Any Alnum Alpha Alphabetic Assigned InGothic Gothic Is_Gothic L Lo Goth Gr_Base Grapheme_Base Graph GrBase
           ID_Continue IDC ID_Start IDS Letter L_ Other_Letter Print Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum
           X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
        Age=3.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Gothic Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None
           East_Asian_Width=Neutral Script=Gothic Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX
           Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup
           Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1
           Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 Script=Goth SC=Goth
           Sentence_Break=LE Sentence_Break=OLetter SB=LE Word_Break=ALetter WB=LE Word_Break=LE _X_Begin

    U+1F40D <🐍> \N{SNAKE}
        \pS \p{So}
        All Any Assigned InMiscellaneousSymbolsAnd_Pictographs Common Zyyy So S Gr_Base Grapheme_Base Graph GrBase
           Miscellaneous_Symbols_And_Pictographs Other_Symbol Print Symbol X_POSIX_Graph X_POSIX_Print
        Age=6.0 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=Miscellaneous_Symbols_And_Pictographs
           Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common
           Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX
           Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup
           Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=Other SB=XX Sentence_Break=XX
           Word_Break=Other WB=XX Word_Break=XX _X_Begin

    U+1F631 <😱> \N{FACE SCREAMING IN FEAR}
        \pS \p{So}
        All Any Assigned InEmoticons Common Zyyy Emoticons So S Gr_Base Grapheme_Base Graph GrBase Other_Symbol Print Symbol
           X_POSIX_Graph X_POSIX_Print
        Age=6.0 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=Emoticons Canonical_Combining_Class=0
           Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None
           DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA
           Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining
           JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN
           Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=Other SB=XX Sentence_Break=XX Word_Break=Other WB=XX
           Word_Break=XX _X_Begin
History
Date User Action Args
2011-08-15 18:40:56tchristsetrecipients: + tchrist, lemburg, loewis, terry.reedy, vstinner, ezio.melotti, mrabarnett, Arfrever
2011-08-15 18:40:53tchristlinkissue12730 messages
2011-08-15 18:40:51tchristcreate