This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Rhamphoryncus, ezio.melotti, lemburg
Date 2008-07-12.00:37:31
SpamBayes Score 0.000766996
Marked as misclassified No
Message-id <1215823054.36.0.882088993727.issue3297@psf.upfronthosting.co.za>
In-reply-to
Content
Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32).

The conversions done from the literal escaped representation to the
internal format are done using the unicode-escape and raw-unicode-escape
codecs.

PYC files are written using the marshal module, which uses UTF-8 as
encoding for Unicode objects.

All of these codecs know about surrogates, so there must be a bug
somewhere in the Python tokenizer or compiler.

I checked on Linux using a UCS2 and a UCS4 build of Python 2.5: the
problem only shows up with the UCS4 build.
History
Date User Action Args
2008-07-12 00:37:34lemburgsetspambayes_score: 0.000766996 -> 0.000766996
recipients: + lemburg, Rhamphoryncus, ezio.melotti
2008-07-12 00:37:34lemburgsetspambayes_score: 0.000766996 -> 0.000766996
messageid: <1215823054.36.0.882088993727.issue3297@psf.upfronthosting.co.za>
2008-07-12 00:37:33lemburglinkissue3297 messages
2008-07-12 00:37:32lemburgcreate