This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ammar2
Recipients BTaskaya, Mark.Shannon, ammar2, aroberge, brandtbucher, miss-islington, nedbat, pablogsal, serhiy.storchaka, terry.reedy
Date 2021-07-19.18:24:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Had some time to look into this. Just to summarize this problem, it deals with unicode points that are single characters but take up more than the width of a single character, even with a monospace font [1].

In the examples from above, the Chinese character itself counts as one character in a Python string. However, notice that it needs two carets:

>>> x = "该"
>>> print(x)
>>> len(x)
>>> print(x + '\n' + '^^')

This issue is somewhat font dependent, in the case of the emoji I know that windows sometimes renders emojis as single-character wide black-and-white glyphs or colorful ones depending on the program.

As Pablo alluded to, unicodedata.east_asian_width is probably the best solution we can implement. For these wide characters it provides:

>>> unicodedata.east_asian_width('💩')
>>> unicodedata.east_asian_width('该')

W corresponding to Wide. Whereas for regular width characters:

>>> unicodedata.east_asian_width('b')
>>> unicodedata.east_asian_width('=')

we get Neutral (Not East Asian). This can be used to count the "displayed width" of the characters and hence the carets. However, organization is going to be a bit tricky since we're currently using _PyPegen_byte_offset_to_character_offset to get offsets to use for string slicing in the ast segment parsing code. We might have to make a separate function that gets the font display-width.


[1] Way more details on this issue here: and an example of a Python library that tries to deal with this issue here:
Date User Action Args
2021-07-19 18:24:00ammar2setrecipients: + ammar2, terry.reedy, nedbat, aroberge, Mark.Shannon, serhiy.storchaka, pablogsal, miss-islington, brandtbucher, BTaskaya
2021-07-19 18:24:00ammar2setmessageid: <>
2021-07-19 18:24:00ammar2linkissue43950 messages
2021-07-19 18:24:00ammar2create