Message 401082 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eryksun
Recipients	eryksun, ezio.melotti, maxbachmann, vstinner
Date	2021-09-05.12:53:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1630846390.76.0.861212272304.issue45105@roundup.psfhosted.org>
In-reply-to

Content
AFAICT, there is no bug here. It's just confusing how Unicode right-to-left characters in the repr() can modify how it's displayed in the console/terminal. Use the ascii() representation to avoid the problem. > The same behavior does not occur when directly using the unicode point > ``` > >>> s='000\U00010900' The original string has the Phoenician right-to-left character at index 1, not at index 3. The "0" number characters in the original have weak directionality when displayed. You can see the reversal with a numeric sequence that's separated by spaces. For example: s = '123\U00010900456' >>> print(s, sep='\n') 1 2 3 𐤀 4 5 6 >>> print(s) 1 2 3 𐤀 4 5 6 Latin letters have left-to-right directionality. For example: >>> s = '123\U00010900abc' >>> print(*s) 1 2 3 𐤀 a b c You can check the bidirectional property [1] using the unicodedata module: >>> import unicodedata as ud >>> ud.bidirectional('\U00010900') 'R' >>> ud.bidirectional('0') 'EN' >>> ud.bidirectional('a') 'L' --- [1] https://en.wikipedia.org/wiki/Unicode_character_property#Bidirectional_writing

AFAICT, there is no bug here. It's just confusing how Unicode right-to-left characters in the repr() can modify how it's displayed in the console/terminal. Use the ascii() representation to avoid the problem.

> The same behavior does not occur when directly using the unicode point
> ```
> >>> s='000\U00010900'

The original string has the Phoenician right-to-left character at index 1, not at index 3. The "0" number characters in the original have weak directionality when displayed. You can see the reversal with a numeric sequence that's separated by spaces. For example:

s = '123\U00010900456'
>>> print(*s, sep='\n')
1
2
3
𐤀
4
5
6
>>> print(*s)
1 2 3 𐤀 4 5 6

Latin letters have left-to-right directionality. For example:

>>> s = '123\U00010900abc'
>>> print(*s)
1 2 3 𐤀 a b c

You can check the bidirectional property [1] using the unicodedata module:

>>> import unicodedata as ud
>>> ud.bidirectional('\U00010900')
'R'
>>> ud.bidirectional('0')
'EN'
>>> ud.bidirectional('a')
'L'

---

[1] https://en.wikipedia.org/wiki/Unicode_character_property#Bidirectional_writing

History
Date	User	Action	Args
2021-09-05 12:53:10	eryksun	set	recipients: + eryksun, vstinner, ezio.melotti, maxbachmann
2021-09-05 12:53:10	eryksun	set	messageid: <1630846390.76.0.861212272304.issue45105@roundup.psfhosted.org>
2021-09-05 12:53:10	eryksun	link	issue45105 messages
2021-09-05 12:53:10	eryksun	create