Message 102265 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	dangra, ezio.melotti, lemburg, sjmachin, vstinner
Date	2010-04-03.14:43:21
SpamBayes Score	3.029281e-09
Marked as misclassified	No
Message-id	<1270305802.77.0.0073241346145.issue8271@psf.upfronthosting.co.za>
In-reply-to

Content
> I also found out that, according to RFC 3629, surrogates > are considered invalid and they can't be encoded/decoded, > but the UTF-8 codec actually does it. Python2 does, but Python3 raises an error. Python 2.7a4+ (trunk:79675, Apr 3 2010, 16:11:36) >>> u"\uDC80".encode("utf8") '\xed\xb2\x80' Python 3.2a0 (py3k:79441, Mar 26 2010, 13:04:55) >>> "\uDC80".encode("utf8") UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed Deny encoding surrogates (in utf8) causes a lot of crashs in Python3, because most functions calling suppose that _PyUnicode_AsString() does never fail: see #6687 (and #8195 and a lot of other crashs). It's not a good idea to change it in Python 2.7, because it would require a huge work and we are close to the first beta of 2.7.

> I also found out that, according to RFC 3629, surrogates 
> are considered invalid and they can't be encoded/decoded, 
> but the UTF-8 codec actually does it.

Python2 does, but Python3 raises an error.

Python 2.7a4+ (trunk:79675, Apr  3 2010, 16:11:36)
>>> u"\uDC80".encode("utf8")
'\xed\xb2\x80'

Python 3.2a0 (py3k:79441, Mar 26 2010, 13:04:55)
>>> "\uDC80".encode("utf8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

Deny encoding surrogates (in utf8) causes a lot of crashs in Python3, because most functions calling suppose that _PyUnicode_AsString() does never fail: see #6687 (and #8195 and a lot of other crashs). It's not a good idea to change it in Python 2.7, because it would require a huge work and we are close to the first beta of 2.7.

History
Date	User	Action	Args
2010-04-03 14:43:22	vstinner	set	recipients: + vstinner, lemburg, sjmachin, ezio.melotti, dangra
2010-04-03 14:43:22	vstinner	set	messageid: <1270305802.77.0.0073241346145.issue8271@psf.upfronthosting.co.za>
2010-04-03 14:43:21	vstinner	link	issue8271 messages
2010-04-03 14:43:21	vstinner	create