This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ArcRiley
Recipients ArcRiley, amaury.forgeotdarc, ezio.melotti, loewis
Date 2009-10-03.23:04:01
SpamBayes Score 2.2373414e-09
Marked as misclassified No
Message-id <1254611044.42.0.753938429452.issue7045@psf.upfronthosting.co.za>
In-reply-to
Content
Amaury, you are absolutely correct, \ud801 is not a valid unicode glyph,
however I am not giving Python \ud801, I am giving Python '𐑑' (==
'\U00010451').

I am attaching a different short example that demonstrates that Python
is mishandling UTF-8 on both the interactive terminal and in scripts, u.py

The output should be the same, but on Python 3.1.1 compiled for wide
unicode it reports two different values.  As someone on #python-dev
found '𐑑'.encode('utf-16').decode('utf-16') outputs the correct value.
History
Date User Action Args
2009-10-03 23:04:04ArcRileysetrecipients: + ArcRiley, loewis, amaury.forgeotdarc, ezio.melotti
2009-10-03 23:04:04ArcRileysetmessageid: <1254611044.42.0.753938429452.issue7045@psf.upfronthosting.co.za>
2009-10-03 23:04:02ArcRileylinkissue7045 messages
2009-10-03 23:04:02ArcRileycreate