This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients eryksun, ezio.melotti, maxbachmann, steven.daprano, vstinner
Date 2021-09-05.13:08:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1630847289.91.0.731347643307.issue45105@roundup.psfhosted.org>
In-reply-to
Content
I'm afraid I cannot reproduce the problem.

>>> s = '000𐤀'  # \U00010900
>>> s
'000𐤀'
>>> s[0]
'0'
>>> s[1]
'0'
>>> s[2]
'0'
>>> s[3]
'𐤀'
>>> list(s)
['0', '0', '0', '𐤀']


That is using Python 3.9 in the xfce4-terminal. Which xterm are you using?

I am very confident that it is a bug in some external software, possibly the xterm, possibly the browser or other application where you copied the PHOENICIAN LETTER ALF character from in the first place. It looks like it is related to mishandling of the Right-To-Left character:

>>> unicodedata.bidirectional(s[3])
'R'


Using Firefox, when I attempt to select the text s = '000...' in Max's initial message with the mouse, the selection highlighting jumps around. See the screenshot attached. (selection.png) Depending on how I copy the text, sometimes I get '000 ALF' and sometimes '0 ALF 00' which hints that something is getting confused by the RTL character, possibly the browser, possible the copy/paste clipboard, possibly the terminal. But regardless, I cannot replicate the behaviour you show where list(s) is different from indexing the characters one by one.

It is very common for applications to mishandle mixed RTL and LTR characters, and that can have all sorts of odd display and copy/paste issues.
History
Date User Action Args
2021-09-05 13:08:09steven.dapranosetrecipients: + steven.daprano, vstinner, ezio.melotti, eryksun, maxbachmann
2021-09-05 13:08:09steven.dapranosetmessageid: <1630847289.91.0.731347643307.issue45105@roundup.psfhosted.org>
2021-09-05 13:08:09steven.dapranolinkissue45105 messages
2021-09-05 13:08:09steven.dapranocreate