Author lemburg
Recipients Rhamphoryncus, ezio.melotti, jwilk, lemburg, loewis, pitrou
Date 2009-04-29.16:54:25
SpamBayes Score 7.54035e-11
Marked as misclassified No
Message-id <1241024067.44.0.155140775503.issue3672@psf.upfronthosting.co.za>
In-reply-to
Content
While it's probably ok to fix the codecs, there's an issue which makes
this difficult at least for the utf-8 codec:

The marshal module uses utf-8 to write Unicode objects and these can and
need to be able to store the full range of supported UCS2/UCS4 code
points, including lone surrogates.

If the utf-8 codec were changed to raise an error for these, marshal
would no longer be able to write/read Unicode objects.

It is likely that other existing Python code (outside the std lib) also
relies on this ability.

Changing this would only be possible in 3.1.

The marshal module would then also have to be changed to use a different
encoding which does support encoding lone surrogates.

See issue 3297 for a discussion of UTF-8/16 vs. UCS2/4, the
implications, motivations, etc.
History
Date User Action Args
2009-04-29 16:54:27lemburgsetrecipients: + lemburg, loewis, Rhamphoryncus, pitrou, jwilk, ezio.melotti
2009-04-29 16:54:27lemburgsetmessageid: <1241024067.44.0.155140775503.issue3672@psf.upfronthosting.co.za>
2009-04-29 16:54:26lemburglinkissue3672 messages
2009-04-29 16:54:25lemburgcreate