Message 148603 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, gvanrossum, lemburg, loewis, tchrist, vstinner
Date	2011-11-29.20:42:29
SpamBayes Score	0.0002851445
Marked as misclassified	No
Message-id	<1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za>
In-reply-to

Content
Python 3.3 has a strange behaviour: >>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le') '\U0010ffff' >>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le') '\U0010ffff' I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error. So I agree that the encoder should reject lone surogates.

Python 3.3 has a strange behaviour:

>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'

I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error.

So I agree that the encoder should reject lone surogates.

History
Date	User	Action	Args
2011-11-29 20:42:30	vstinner	set	recipients: + vstinner, lemburg, gvanrossum, loewis, ezio.melotti, tchrist
2011-11-29 20:42:30	vstinner	set	messageid: <1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za>
2011-11-29 20:42:29	vstinner	link	issue12892 messages
2011-11-29 20:42:29	vstinner	create