This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients benjamin.peterson, ezio.melotti, lemburg, loewis, pitrou, vstinner, ysj.ray
Date 2010-04-19.08:55:47
SpamBayes Score 1.51823e-13
Marked as misclassified No
Message-id <1271667348.61.0.505710926212.issue8438@psf.upfronthosting.co.za>
In-reply-to
Content
> I consider this an important missing backport for 2.7, since
> without this handler, the UTF-8 codecs in 2.7 and 3.x are
> incompatible and there's no other way to work around this
> other than to make use of the errorhandler conditionally
> depend on the Python version.

FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 while working on #8271, and found out this difference in the handling of surrogates (only on 3.x they are invalid).
I didn't change the behavior of the codec in the patch I attached to #8271 because it was out of the scope of the issue, but I consider the fact that in Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 3629.
IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I didn't have time yet to investigate how Python 3 handles this and what is the best solution (e.g. adding another codec or change the default behavior).
History
Date User Action Args
2010-04-19 08:55:48ezio.melottisetrecipients: + ezio.melotti, lemburg, loewis, pitrou, vstinner, benjamin.peterson, ysj.ray
2010-04-19 08:55:48ezio.melottisetmessageid: <1271667348.61.0.505710926212.issue8438@psf.upfronthosting.co.za>
2010-04-19 08:55:47ezio.melottilinkissue8438 messages
2010-04-19 08:55:47ezio.melotticreate