This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author maxbachmann
Recipients ezio.melotti, maxbachmann, vstinner
Date 2021-09-05.11:12:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1630840329.53.0.683865786934.issue45105@roundup.psfhosted.org>
In-reply-to
Content
I noticed that when using the Unicode character \U00010900 when inserting the character as character:
Here is the result on the Python console both for 3.6 and 3.9:
```
>>> s = '0𐤀00'
>>> s
'0𐤀00'
>>> ls = list(s)
>>> ls
['0', '𐤀', '0', '0']
>>> s[0]
'0'
>>> s[1]
'𐤀'
>>> s[2]
'0'
>>> s[3]
'0'
>>> ls[0]
'0'
>>> ls[1]
'𐤀'
>>> ls[2]
'0'
>>> ls[3]
'0'
```

It appears that for some reason in this specific case the character is actually stored in a different position that shown when printing the complete string. Note that the string is already behaving strange when marking it in the console. When marking the special character it directly highlights the last 3 characters (probably because it already thinks this character is in the second position).

The same behavior does not occur when directly using the unicode point
```
>>> s='000\U00010900'
>>> s
'000𐤀'
>>> s[0]
'0'
>>> s[1]
'0'
>>> s[2]
'0'
>>> s[3]
'𐤀'
```

This was tested using the following Python versions:
```
Python 3.6.0 (default, Dec 29 2020, 02:18:14) 
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux

Python 3.9.6 (default, Jul 16 2021, 00:00:00) 
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
```
on Fedora 34
History
Date User Action Args
2021-09-05 11:12:09maxbachmannsetrecipients: + maxbachmann, vstinner, ezio.melotti
2021-09-05 11:12:09maxbachmannsetmessageid: <1630840329.53.0.683865786934.issue45105@roundup.psfhosted.org>
2021-09-05 11:12:09maxbachmannlinkissue45105 messages
2021-09-05 11:12:09maxbachmanncreate