This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, gvanrossum, lemburg, loewis, tchrist, vstinner
Date 2011-11-29.20:42:29
SpamBayes Score 0.0002851445
Marked as misclassified No
Message-id <1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za>
In-reply-to
Content
Python 3.3 has a strange behaviour:

>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'

I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error.

So I agree that the encoder should reject lone surogates.
History
Date User Action Args
2011-11-29 20:42:30vstinnersetrecipients: + vstinner, lemburg, gvanrossum, loewis, ezio.melotti, tchrist
2011-11-29 20:42:30vstinnersetmessageid: <1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za>
2011-11-29 20:42:29vstinnerlinkissue12892 messages
2011-11-29 20:42:29vstinnercreate