Message 188442 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	ezio.melotti, pitrou, rhettinger, serhiy.storchaka
Date	2013-05-05.13:10:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1367759430.76.0.988181674037.issue17909@psf.upfronthosting.co.za>
In-reply-to

Content
RFC 4627 specifies a method to determine an encoding (one of UTF-8, UTF-16(BE\|LE) or UTF-32(BE\|LE)) of encoded JSON text. The proposed preliminary patch (it doesn't include the documentation yet) allows load() and loads() functions accept bytes data when it is encoded with standard Unicode encoding. Also accepted data with BOM (this doesn't specified in RFC 4627, but is widely used). There is only one case where the method can give a misfire. Serialized string "\x00..." encoded in UTF-16LE may be erroneously detected as encoded in UTF-32LE. This case violates the two rules of RFC 4627: the string was serialized instead of a an object or an array, and the control character U+0000 was not escaped. The standard encoded JSON always detected correctly. This patch requires "surrogatepass" error handler for utf-16/32 (see issue12892 and issue13916).

RFC 4627 specifies a method to determine an encoding (one of UTF-8, UTF-16(BE|LE) or UTF-32(BE|LE)) of encoded JSON text. The proposed preliminary patch (it doesn't include the documentation yet) allows load() and loads() functions accept bytes data when it is encoded with standard Unicode encoding. Also accepted data with BOM (this doesn't specified in RFC 4627, but is widely used).

There is only one case where the method can give a misfire. Serialized string "\x00..." encoded in UTF-16LE may be erroneously detected as encoded in UTF-32LE. This case violates the two rules of RFC 4627: the string was serialized instead of a an object or an array, and the control character U+0000 was not escaped. The standard encoded JSON always detected correctly.

This patch requires "surrogatepass" error handler for utf-16/32 (see issue12892 and issue13916).

History
Date	User	Action	Args
2013-05-05 13:10:30	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger, pitrou, ezio.melotti
2013-05-05 13:10:30	serhiy.storchaka	set	messageid: <1367759430.76.0.988181674037.issue17909@psf.upfronthosting.co.za>
2013-05-05 13:10:30	serhiy.storchaka	link	issue17909 messages
2013-05-05 13:10:30	serhiy.storchaka	create