This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients BTaskaya, Mark.Shannon, ammar2, aroberge, brandtbucher, miss-islington, nedbat, pablogsal, serhiy.storchaka, terry.reedy
Date 2021-07-19.21:03:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1626728590.5.0.25618739735.issue43950@roundup.psfhosted.org>
In-reply-to
Content
The effort to match caret lines to general unicode is similar to a previous issue that was closed as futile.  (But I could not find it.)  It has a downside that should be considered.

The fundamental problem is that there is no fixed pitch font for unicode. (Let alone any font covering all of unicode.) Nor is there a single-double width definition, or font, for all of unicode.  Some character sets are not amenable to such treatment.

To see the problem easier, open, for instance, IDLE's option/settings dialog, showing the fonts tab and a multi-script sample.  On Windows, select what I believe is the most 'fixed' font -- Courier New.  ASCII, Latin1, IPA, Greek, Cyrillic, Hebrew, and Arabic are all rendered in the same fixed pitch.  But the last 4 Cyrillic characters of "...ЪъЭэѠѤѬӜ" are extremely cramped and may be rendered differently from the rest.  The East Asian characters are in a different fixed pitch, about 1.6 times the Ascii, etc. fixed pitch.  (So the double-wide 2 is 1.6 rounded up.  And with some fonts, the East Asian scripts are not all the same pitch.)  The South Asian script are variable pitch and for the sample chars, average wider than 1 (they have 20 chars, like the Ascii, etc, lines).  Tamil, especially, has a wide range of widths, with the widest as wide as the East Asian chars.

On Windows, on my machine, pasting the sample text between quotes results in the Greek chars, the last 4 Cyrillic chars, and all Asian chars (including Hebrew and Arabic) being replaced by replacement chars.  (I thought that this was better once, but maybe I mis-remember.)  While one can get script-specific fonts, the fixed-pitch South Asian fonts I tried on Mac were hardly readable.  My conclusion is that people using certain scripts and anyone wanting a wide variety of scripts needs to use a GUI-based editor and shell rather than a fixed-pitch terminal/console.

As long as the caret line has 1 char per code char, a GUI program can use it to mark code characters, and do so differently for '~' and '^'.  If some of these chars are doubled, exact character information is lost.  If you go ahead with this, please use a third character, such as '-', for additions.  GUI programs could then ignore these, given that they can otherwise can get the start character information.
History
Date User Action Args
2021-07-19 21:03:10terry.reedysetrecipients: + terry.reedy, nedbat, aroberge, Mark.Shannon, serhiy.storchaka, ammar2, pablogsal, miss-islington, brandtbucher, BTaskaya
2021-07-19 21:03:10terry.reedysetmessageid: <1626728590.5.0.25618739735.issue43950@roundup.psfhosted.org>
2021-07-19 21:03:10terry.reedylinkissue43950 messages
2021-07-19 21:03:09terry.reedycreate