This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Neui, SilentGhost, eryksun, ezio.melotti, jberg, ncoghlan, vstinner
Date 2021-03-13.13:22:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1615641726.88.0.951938719375.issue35883@roundup.psfhosted.org>
In-reply-to
Content
> Right, enabling explicitly the Python UTF-8 Mode works around the issue

When the Python UTF-8 Mode is used, on macOS or on Android, Python uses its own UTF-8 decoder which respects the RFC 3629: it rejects characters outside [U+0000; U+10ffff].

Otherwise, Python relies on the libc mbstowcs() decoder which may or may not create characters outside the [U+0000; U+10ffff] range. I understand that this issue is mostly about the UTF-8 encoding, I don't think that other encodings can produce characters greater than U+10ffff code point.
History
Date User Action Args
2021-03-13 13:22:06vstinnersetrecipients: + vstinner, ncoghlan, ezio.melotti, SilentGhost, eryksun, Neui, jberg
2021-03-13 13:22:06vstinnersetmessageid: <1615641726.88.0.951938719375.issue35883@roundup.psfhosted.org>
2021-03-13 13:22:06vstinnerlinkissue35883 messages
2021-03-13 13:22:06vstinnercreate