Message103562
> I consider this an important missing backport for 2.7, since
> without this handler, the UTF-8 codecs in 2.7 and 3.x are
> incompatible and there's no other way to work around this
> other than to make use of the errorhandler conditionally
> depend on the Python version.
FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 while working on #8271, and found out this difference in the handling of surrogates (only on 3.x they are invalid).
I didn't change the behavior of the codec in the patch I attached to #8271 because it was out of the scope of the issue, but I consider the fact that in Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 3629.
IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I didn't have time yet to investigate how Python 3 handles this and what is the best solution (e.g. adding another codec or change the default behavior). |
|
Date |
User |
Action |
Args |
2010-04-19 08:55:48 | ezio.melotti | set | recipients:
+ ezio.melotti, lemburg, loewis, pitrou, vstinner, benjamin.peterson, ysj.ray |
2010-04-19 08:55:48 | ezio.melotti | set | messageid: <1271667348.61.0.505710926212.issue8438@psf.upfronthosting.co.za> |
2010-04-19 08:55:47 | ezio.melotti | link | issue8438 messages |
2010-04-19 08:55:47 | ezio.melotti | create | |
|